Hyperparameter optimization for Linear regression

Screen Link: https://app.dataquest.io/m/237/gradient-descent/2/single-variable-gradient-descent

The screen mentions that hyperparameter optimization is not covered in this course (screen shot attached). Wanted to know if this topic has been covered for linear regression in some other course in the data science path.
If not then can you please share some resources where this topic has been covered specifically for linear regression.

Hi vinayak.naik87

Following is the screen link for Hyperparameter optimization

Hope this helps.

Thanks for your response. I have seen the topic mentioned by you but it pertains to KNN. I do understand that the grid search methodology will remain the same , however my question is more specific to linear regression.
In case of Linear regression (optimization with gradient descent) there will be two main hyperparameters:

  1. Learning rate
  2. Initial parameter selection

My question was more towards the 2nd hyperparameter i.e. Initial parameter selection.
Are there some techniques to initialize these parameters better?
Secondly will this selection of initial parameters affect model performance or just the time taken for convergence?

Hi @vinayak.naik87,

I am not sure that the “initial parameter selection” is so important. Usually you can just use the default parameters and then with grid search you optimize your parameters.

According to scikit doc, there are 4 different learning rate schedules:

  • ‘constant’: eta = eta0
  • ‘optimal’: eta = 1.0 / (alpha * (t + t0)) where t0 is chosen by a heuristic proposed by Leon Bottou.
  • ‘invscaling’: eta = eta0 / pow(t, power_t)
  • ‘adaptive’: eta = eta0, as long as the training keeps decreasing. Each time n_iter_no_change consecutive epochs fail to decrease the training loss by tol or fail to increase validation score by tol if early_stopping is True, the current learning rate is divided by 5.

Hope it helps.