How to gauge overfit with MLPClassifier and cross_val_score?

I’m working on the Guided Project Here

When using MLPClassifier.fit() and MLPClassifier.predict() I would do a manual validation (looking for overfit) by running the training set again through the prediction and accuracy (as follows)…

from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import cross_val_score

target = ["target"]
features = [c for c in shuf_df.columns if c not in target]
neurons =[8, 16, 32, 64, 128, 256]
for activation in ["logistic", "relu"]:
    for neuron in neurons:
        mlp = MLPClassifier(hidden_layer_sizes=(neuron,), activation=activation)
        start = datetime.now()
        mlp.fit(train_df[features], train_df[target])
        nn_predictions = mlp.predict(test_df[features])
        accuracy_test = accuracy_score(test_df["target"], nn_predictions)
        nn_predictions = mlp.predict(train_df[features])
        accuracy_train = accuracy_score(train_df["target"], nn_predictions)
        print("({}) [{}] neuron:{}, accuracy_test:{}, accuracy_train:{}".format(datetime.now() - start, 
                                                                                activation, 
                                                                                neuron, 
                                                                                accuracy_test, 
                                                                                accuracy_train
                                                                                )
        )

…which results in…

(0:00:00.750631) [logistic] neuron:8, accuracy_test:0.8611111111111112, accuracy_train:0.9700765483646486 # Probably OK 
(0:00:00.860471) [logistic] neuron:16, accuracy_test:0.8916666666666667, accuracy_train:0.9930410577592206 # Approaching overfit 
(0:00:01.491433) [logistic] neuron:32, accuracy_test:0.8972222222222223, accuracy_train:0.9993041057759221 # Probably overfit 
(0:00:01.951523) [logistic] neuron:64, accuracy_test:0.9166666666666666, accuracy_train:1.0 # overfit 
(0:00:02.449780) [logistic] neuron:128, accuracy_test:0.925, accuracy_train:1.0 # overfit 
(0:00:03.304685) [logistic] neuron:256, accuracy_test:0.925, accuracy_train:1.0 # overfit 
(0:00:00.846773) [relu] neuron:8, accuracy_test:0.8583333333333333, accuracy_train:0.9436325678496869 # Probably OK 
(0:00:00.905262) [relu] neuron:16, accuracy_test:0.8777777777777778, accuracy_train:0.9986082115518441 # Probably overfit 
(0:00:01.531930) [relu] neuron:32, accuracy_test:0.8972222222222223, accuracy_train:1.0 # overfit 
(0:00:01.695193) [relu] neuron:64, accuracy_test:0.9083333333333333, accuracy_train:1.0 # overfit 
(0:00:01.503808) [relu] neuron:128, accuracy_test:0.9027777777777778, accuracy_train:1.0 # overfit 
(0:00:02.060312) [relu] neuron:256, accuracy_test:0.9194444444444444, accuracy_train:1.0 # overfit 

How would I determine overfit when using cross_val_score? (as follows)…

target = ["target"]
features = [c for c in shuf_df.columns if c not in target]
neurons =[8, 16, 32, 64, 128, 256]
for activation in ["logistic", "relu"]:
    for neuron in neurons:
        mlp = MLPClassifier(hidden_layer_sizes=(neuron,), activation=activation)
        cv_scores = cross_val_score(mlp, shuf_df[features], shuf_df[target], cv=4)
        print("[{}] neuron:{}, cv_scores:{}".format(activation,
                                                    neuron,
                                                    cv_scores, 
                                                    )
        )

Looks like you are looking for better output, use cross_validate:


https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_validate.html