3. Derivative of a cost function; 237-3

Hi there,

I’ve been struggling through the intuitive understanding of how we got to the final derivative for MSE.

After applying the power rule & the chain rule together I understand how we’ve gotten to:

The next part confuses me:

If we’re treating y & x as constants wouldn’t that mean when getting the derivative they would become zero? I’ve been searching around YT and have really struggled through this for the past couple days, any help would be appreciated.

Tangentially, this makes it tough to approach the next problem as I’m not understanding the intuition here.

Apologies if I didn’t tag this properly.


Please format your post appropriately. Right now, most of it is one letter/character per line and it’s unclear what exactly you are trying to ask about. Also, make sure to include the link to the Mission/Mission Step you are referring to.

Thanks, I couldn’t get the mathematical notations to paste properly so took screenshots.

1 Like

It depends. Consider the function f\colon \mathbb R\rightarrow \mathbb R, a\mapsto 17 (in words, the function defined over the real numbers that is constantly 17). If you differentiate this, then you get the constantly zero function.

Now consider g\colon \mathbb R\rightarrow \mathbb R, a\mapsto 2a. Here the constant (relative to your question) is 2. Do you expect to get the zero function when differentiating g?

Going back to the lesson, replace x^{(i)} and y^{(i)} with some actual numbers, just help think about this. That is, differentiate, for example,

\large x\mapsto a_1\cdot \underbrace {2}_{x^{(i)}} - \underbrace{3}_{y^{(i)}}

What do you get?

Thanks for the reply Bruno.

For anyone reading this in the future, I’d never seen the notation Bruno posted. Read this: notation - What does the function f: x ↦ y mean? - Mathematics Stack Exchange

Ohhhh ok, I think I get it. Using the actual numbers presented we get: 2a1 - 3. The 3 ‘disappears’ when differentiating the function and because we’re differentiating with respect to a1, x1 is the only variable left.

Is my understanding correct? And thank you, I was initially frustrated my your explanation but it was deceptively helpful. :slight_smile:

Also, for anyone that needs more understanding / intuition for Gradient Descent… Stat Quest to the rescue: Gradient Descent, Step-by-Step - YouTube

Noob question but how do I tag the appropriate lesson in the initial question above?

Yes :slight_smile:

You should see a pencil next to the post’s title.


Click on it and then fill in the following box appropriately.