PCA, I get a worse result

Hello, I am working on the final project of the module and I have come across an unexpected situation.

i have applied the multivariate knn model reaching a result close to 91% of score for the test set, but with the following combination of dimensions and neighbors:

Screen Link:

the code in which I mount the loop with which I search for the optimum is the following
(sorry for the Spanish code comments)

My Code:

# intento de montar un solo bucle

contador = 0

for ncomp in range(1,len(components_redu),1):

    # Generamos variables
    resultsPCA = execute_PCA(n_components=ncomp, varX=x, varY=y)
    train_PCA = resultsPCA[:111]
    test_PCA = resultsPCA[111:]

    # Dividimos 
    train_data_x = train_PCA.drop(target_col, axis=1) 
    train_data_y = pd.DataFrame(train_PCA, columns=[target_col])

    test_data_x = test_PCA.drop(target_col, axis=1)
    test_data_y = pd.DataFrame(test_PCA, columns=[target_col])  

    # Escalamos las x
    sc = StandardScaler()
    train_data_x = pd.DataFrame(sc.fit_transform(train_data_x), columns=list(train_data_x.columns))
    test_data_x = pd.DataFrame(sc.fit_transform(test_data_x), columns=list(test_data_x.columns))

    # print(train_data_x)
    # print(test_data_x)
    # print("---------")
    for k in range(1,20,1):

        for algorithm in algorithms:

            contador += 1

            # Creamos el objeto que va a contener a nuestro algoritmo
            knn = KNeighborsRegressor(n_neighbors=k, algorithm=algorithm)

            # Entrenamos el modelo
            knn.fit(train_data_x, train_data_y)

            # Calculamos las predicciones
            predictions = knn.predict(test_data_x)
            # Obtenemos el score 
            score_train = knn.score(train_data_x, train_data_y)
            score_test = knn.score(test_data_x, test_data_y) 
            # Calculamos el error mediante mse y rmse
            mse = mean_squared_error(predictions, test_data_y)
            rmse = np.sqrt(mse)
            pcaValues.loc[contador] = pd.Series({'k':k, 'algorithm':algorithm, 'n_components':ncomp, 'score_train':score_train, 'score_test':score_test, 'rmse':rmse})

What I expected to happen:

considering that the best result is obtained with the combination of 13 dimensions I expected that after applying PCA a better result would remain

What actually happened:

however the result obtained is worse than the one obtained without applying PCA

I understand that it is a bit complicated to diagnose the problem without having direct access to the code for testing. I would like to know if anyone

1/ has run PCA on this dataset and come up with a positive result
2/ can think of any reason why the code is not giving a positive result

Mention that some issues, for example the code section in which I create the dataframe in which I store the data, I have not included in the code shown to avoid overextending it.

the result obtained when applying PCA is the following (in case this information could give some interesting hints):

464, 2, brute, 7, 0.945310, 0.845818, 3118.862909

in any case thank you very much for reviewing it, I am waiting, greetings!

1 Like

Hey @Moshe,

This kind of thing happened to me when I was working on a personal project.

I came to know that, applying PCA does not guarantee good results. I improved my result by removing columns that do not provide any value. You can check the feature importance for deciding which feature to keep.


First of all, thanks for answering. In fact, the result that I got with 91.5% was obtained when doing a univariate analysis and later the multivariate applying the criterion of means.

what surprises me in this case is not reaching a level of improvement taking into account the high number of dimensions, everything seemed to point out that applying a dimensionality reduction algorithm (for these dimensions) would potentially manage to eliminate the noise generated by the correlated dimensions and thus improve the result.

In any case, thank you very much for your contribution, a greeting!

1 Like

no one has any idea? :frowning: