Need help with cross_val_score

Hi all!

Screen Link:
https://app.dataquest.io/m/240/guided-project%3A-predicting-house-sale-prices/1/introduction

I know this is to be solved differently, but I wanted to try my luck with sklearns provided functions. Here´s what I did:

My Code:

new_features = df.loc[:,["Gr Liv Area"]]
new_targets = df.SalePrice
new_features.shape

output:

(2930, 1)
new_targets.shape

output:

(2930,)
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
lr = LogisticRegression(solver="saga")
cross_val_score(lr, new_features, new_targets, cv = 5, scoring="accuracy").mean()

What I expected to happen:
getting a score :slight_smile:

What actually happened:

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/model_selection/_split.py:665: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5.
  warnings.warn(("The least populated class in y has only %d"
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/linear_model/_sag.py:329: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn("The max_iter was reached which means "

I tried different solver methods, different cv values, different models… same result :slight_smile: As I understand it there´s something wrong with my target data? I just can´t see how it´s not being able to split into 5-folds :confused:

Tried the same thing with kaggles titanic data and it works like a charm!

Would greatly appreciate any hints/ideas.

many thanks in advance
Marina

Hi Marina,

Did you try using the “neg_root_mean_squared_error” for the scoring parameter in cross_val_score?

I was able to successfully use this in my code.

Hope it helps.