Cost Function Simplification

Screen Link:

Why are we treating x and y as constants with respect to a when simplifying the cost function? Why is x left but y is removed? Why is a removed?

My Code:

Replace this line with your code

What I expected to happen:

What actually happened:

Replace this line with the output/error

How familiar are you with the basics of calculus? It might help to revise/review some of those concepts again because it would help answer these questions.

Since we are differentiating with respect to a, we treat x and y as constants because neither of them is dependent on/a function of a. Referring to the 7.Linearity Of Differentiation screen in the Finding Extreme Points lesson -

Linearity of differentiation consists of 2 rules… Second is the constant factor rule, which lets us pull out constants from the derivative:

\frac{d}{dx}[cf(x)] = c\frac{d}{dx}[f(x)]

c above is a constant, not necessarily because it might have a fixed numerical value, but, because it’s not dependent on or not a function of x. If we change x it will have no impact on c. c could be something like y^{3} + y^{2} and changing x would not affect it at all.

Based on that, we differentiate the loss function.

So,

\frac{d}{da}[ax] = x\frac{d}{da}[a] = x (because differentiating a with respect to a will give us 1)

and,

\frac{d}{da}[y] = 0 (because differentiating y with respect to a will give us 0 since y is a constant here in relation to a)

or, the above can be written as -

\frac{d}{da}[y] = y\frac{d}{da}[1] = 0

I would suggest going through the basics of Calculus to make sure this is clear. Some of the introductory courses on Khan Academy should be enough to help you grasp the basics.

Okay this is very helpful–I’ve been trying to understand the constant factor rule and your break down was very helpful, especially as it relates to differentiating with respect to a particular function. Taking this a step further–this is why MSE(a0,a1) simplifies to 2(a0+a1x-y)*1/n:
d/da0[a0]=1 + d/da0[a1x]=0 - d/da0[y]=0
The two main points I am taking away:

  1. differentiating a0 with respect to a0 = 1
  2. differentiating a constant with respect to a0 = 0