Screen Link: https://app.dataquest.io/m/132/overfitting/5/cross-validation

My Code:

```
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
import numpy as np
# print((cars['horsepower'] == '?').sum())
# cars['horsepower'] = cars['horsepower'].replace('?',np.NaN).astype(float)
# cars['horsepower'] = cars['horsepower'].fillna(cars['horsepower'].mean())
def train_and_cross_val(cols):
mses = []
variances= []
kf = KFold(10,shuffle=True,random_state=3)
for train,test in kf.split(cars):
lr = LinearRegression()
lr.fit(cars[cols].iloc[train],cars['mpg'].iloc[train])
predictions = lr.predict(cars[cols].iloc[test])
mses.append(mean_squared_error(cars['mpg'].iloc[test],predictions))
variances.append(np.var(predictions))
return (np.average(mses),np.average(variances))
two_mse,two_var = train_and_cross_val(['cylinders','displacement'])
three_mse,three_var = train_and_cross_val(['cylinders','displacement','horsepower'])
four_mse,four_var = train_and_cross_val(['cylinders','displacement','horsepower','weight'])
five_mse,five_var = train_and_cross_val(['cylinders','displacement','horsepower','weight','acceleration'])
six_mse,six_var = train_and_cross_val(['cylinders','displacement','horsepower','weight','acceleration','model year'])
seven_mse,seven_var = train_and_cross_val(['cylinders','displacement','horsepower','weight','acceleration','model year','origin'])
```

If you run my code, you’ll get an error due to cars[‘horsepower’] containing six ‘?’ values, and thus not being not able to have the whole series be converted to a float, and thus not able to be passed in as a parameter to the LinearRegression objects’ fit function. If you uncomment the three commented lines of code at the top, the lines will first prints out that there is indeed 6 ‘?’ values, then convert those values to the mean of the remaining values. The code will then run smoothly, and the value of the variables will be very close to the expected value when looking at the variable inspector.

Copy and pasting the solution code works, but as far as I can tell, there isn’t anything fundamentally different between the solution code and mine – that is, nothing so different in the solution code that would seem to have any affect on the ‘?’ values I can see.