Project: Creating a Kaggle Workflow - Max Iterations Limit

Guided Project: Creating a Kaggle Workflow - https://app.dataquest.io/m/188/guided-project%3A-creating-a-kaggle-workflow/6/selecting-and-tuning-different-algorithms

My Code:

from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression

def select_model(df, features):
    all_X = df[features]
    all_y = df['Survived']
    models = [
        {'name': 'LogisticRegression',
        'estimator': LogisticRegression(),
        'hyperparameters': 
        {'solver': ['newton-cg', 'lbfgs', 'liblinear']
        }},
        {'name': 'KNeighborsClassifier',
        'estimator': KNeighborsClassifier(),
        'hyperparameters':
        {'n_neighbors': range(1,20,2),
        'weights': ['distance', 'uniform'],
        'algorithm': ['ball_tree', 'kd_tree', 'brute'],
        'p': [1,2]
        }},
        {'name': 'RandomForestClassifier',
        'estimator': RandomForestClassifier(),
        'hyperparameters': 
        {'n_estimators': [4,6,9],
        'criterion': ['entropy', 'gini'],
        'max_depth': [2,5,10],
        'max_features': ['log2', 'sqrt'],
        'min_samples_leaf': [1,5,8],
        'min_samples_split': [2,3,5]
        }}
    ]
    for model in models:
        print(model['name'])
        print('-'*len(model['name']))
        grid = GridSearchCV(model['estimator'], param_grid=model['hyperparameters'], cv=10)
        grid.fit(all_X, all_y)
        model['best_params'] = grid.best_params_
        model['best_score'] = grid.best_score_
        model['best_model'] = grid.best_estimator_
        print('Best Model: {}'.format(model['best_model']))
        print('\n', 'Best Score: {}'.format(model['best_score']))
        print('\n', 'Best Parameters: {}'.format(model['best_params']))
        return models
best_model = select_model(train, cols)

What I expected to happen:
Return the best score and hyperparameters for each model trained.

What actually happened:
Produced an output for LogisticRegression and then did not run any further:

anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py:940: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)

Should I be entering an extra keyword argument somewhere for max iterations? Or is something else missing?

Notebooks is attached below:

Project - Creating a Kaggle Workflow.ipynb (116.8 KB)

Click here to view the jupyter notebook file in a new tab

You are working on your local system and that means you are likely working with the latest version of sklearn. The one in the DQ platform is a slightly older version.

So, there are bound to be some differences that pop-up. You can either work through the DQ platform, or you can check out the documentation for each of the models you are working with and add the parameter based on the warning message you get.

Do note that it is still a warning and not an error so you should still be able to proceed.

Thank you, that explains it. I think I was a little impatient earlier, letting the code run does return all the outputs as you’ve highlighted.

Thanks again for your help.

1 Like