Learning curves in Python

Hi All,

I’m working on an assignment and I am a bit stuck, so it would be good, if someone can point me in the right direct.

As per the assignment, we have been asked to split the data set into 3 separate sets: training set, validation set and a test set.

The question, then states below:

Produce a learning curve of the size of training set against the performance
measurements. The performance should be measured on both the training set and the
validation set. You need to choose at least 10 different sizes for the training set. For
example, the first size may be 10% of the total training set produced in Task 3.
• Remember to scale the corresponding training set and the validation set.

I’ve managed to scale the data, and split the training data set into 14 different sizes. I have then used a simple linear regression modal to learn on each of the 14 different data sets, to enable me to plot the learning curve.

I’m using RMSE as the measurement, however I’m getting very small values or odd results when calculating the RMSE on the training set (4.670923877202429e-16) and validation set (3.670122607849604e-16).

My learning curve does not look correct (Jupyter notebook provided below), can someone please help me or point me in the right direction.

Draft_04080177_Coursework.py (33.5 KB)

Have you completed this assignment ??