Screen Link:

https://app.dataquest.io/m/237/gradient-descent/5/gradient-of-the-cost-function

My Code:

Can the derivative of a0 be explained? When following along on paper and using the chain/power rule I get

```
d'(a0) = 2(a0+a1x1^(i) - y^(i)) * d'(a0)(a0 +a1x1^(i) - y^(i))
```

What I expected to happen:

I expect that the remaining derivative term becomes a constant

What actually happened:

```
d'(a0) = 2(a0+a1x1^(i) - y^(i))
```

In the previous example with d’(a1), the derivative term became a constant. In the d’(a0) example, the derivative term simply no longer exists. I am trying to follow along doing the proof on paper but uncertain what happens to the derivative term in d’(a0)

Yes, that’s correct.

\frac{d}{da_0}MSE(a_0, a_1) = \frac{1}{n}\sum\limits_{i=1}^{n}2*(a_0 + a_1x_1^{(i)} - y^{(i)})*\frac{d}{da_0}(a_0 + a_1x_1^{(i)} - y^{(i)})

Now, for

\frac{d}{da_0}(a_0 + a_1x_1^{(i)} - y^{(i)})

we get

\frac{d}{da_0}a_0 + \frac{d}{da_0}(a_1x_1^{(i)}) + \frac{d}{da_0}y^{(i)}

Since the 2nd and 3rd term above don’t include a_0 , their derivative with respect to a_0 will be `0`

.

And the first term will be equal to `1`

.

So, we get

\frac{d}{da_0}MSE(a_0, a_1) = \frac{1}{n}\sum\limits_{i=1}^{n}2*(a_0 + a_1x_1^{(i)} - y^{(i)})

or

\frac{d}{da_0}MSE(a_0, a_1) = \frac{2}{n}\sum\limits_{i=1}^{n}(a_0 + a_1x_1^{(i)} - y^{(i)})

1 Like

Okay from what I am learning, the derivative of a constant with respect to a differentiating variable such as: d/da0(a1x1^(i)) = 0; whereas d/da0(a0) = 1.

In the 3rd section of the mission, we differentiated d/da1(a1x1^(i) - y^(i)) and the result was x1. Using the rules above, we get d/da1(a1) * d/da1(x1) - d/da1(y) = 1 * 0 - 0 = 0. This is obviously wrong.

Am I in incorrect in breaking apart the term d/da1(a1x1) in the above example?

Correct, you should not “break apart” this term. The rule is that the derivative of a sum is the sum of the derivatives. It is not true for multiplication (ie the derivative of a product **IS NOT** the product of the derivatives.) In fact, there is a special formula for calculating the derivative of a product and is called the product rule. That said, the formula is most often used when the product is between two variables and not between a variable and a constant like we are doing here…although you *could* use it if you wanted. The “better rule” to use here is that when you are finding the derivative of a variable (a1) multiplied by a constant (x1) then you can simply “pull the constant out front” and just find the derivative of the variable. In other words: d/da1(a1x1) = (x1)d/da1(a1) = x1 * 1 = x1.

1 Like