Clarification needed on Gradient Descent

Course Link for reference:
https://app.dataquest.io/c/41/m/237/gradient-descent/1/introduction

I just started learning Gradient Descent today and I need clarification on couple of things to proceed further. Appreciate a response.

In page 1, we aim to minimize the following function.

Screen Shot 2021-11-16 at 11.15.10 AM

How did this equation become y power i in the next page (page 2)? (screenshot below for reference)

Screen Shot 2021-11-16 at 11.17.49 AM

In page no 3, what is a1_list for and why are we choosing alpha as 10?

a1_list = [1000]
alpha = 10

In the solution code in the same page, alpha is chosen as .0000003 and a1_initial as 150. And I don’t understand why? Please help understand.

Regards,
Brindha

Just to clarify, it’s not “y to the power i” in mathematical terms(if that was confusing from a mathematical perspective).

It is simply a different way to represent the same thing, but could depend on the context as well.

  • y_i could correspond to a single value or a vector.

  • y^{(i)} could correspond to a single value or a vector

  • y^{(i)}_m could correspond to a vector y^{(i)} with a specific value in that vector at the m^{th} index.

I have seen variations of these across different content (outside of DQ). So, look at the context of what it’s supposed to represent and move forward from there.

I haven’t gone through that content to be sure, but I would still recommend that you use the ? button in the top-right corner of the classroom to provide feedback about this to them because I think there should be either consistency with the notations or, at least, clarification on the change in notation.

a1_list is a list that stores the values of a1. In the first iteration a1 is set to 1000. The derivative is calculated and a new a1 is obtained. That new a1 is then added to the a1_list. In the second iteration th derivative is calculated at the new a1 and so on.

The values they chose are just to explain the algorithm to us. Depending on certain factors, we are meant to tune the value of alpha (\alpha) when using gradient descent. Finding a good value of alpha is often experimental. I am keeping it simple here because there’s more to it which you will ideally learn later.

That whole concept is likely something they explore later in the course (I am, honestly, not aware if they do or not, but it is unlikely they don’t since it’s a pretty standard concept).

2 Likes

Hello @the_doctor Thank you SO much for the response. Appreciate the detailed explanation. Since yesterday I have been reading about Learning Rate outside of DQ and I understand there is more to it. I am good for now.
Thank you once again!

Regards,
Brindha