def knn_train_test(df,features,target): np.random.seed(1) tmp_df=df.copy() # Randomize order of rows in data frame. shuffled_index = np.random.permutation(tmp_df.index) tmp_df = tmp_df.reindex(shuffled_index) #tmp_df=df.loc[np.random.permutation(len(df))] # Split full dataset into train and test sets train_df=tmp_df.iloc[0:int(len(tmp_df)*.5)] test_df=tmp_df.iloc[int(len(tmp_df*.5)):] # Instantiate model model=KNeighborsRegressor() # Fit a KNN model to the training data (using k=5 default) model.fit(train_df[features],train_df[target]) # Make predictions using model predictions=model.predict(test_df[features]) # Calculate RSME mse=mean_squared_error(test_df[target],predictions) rmse=np.sqrt(mse) return rmse
What I expected to happen:
I want to split the data set into 2 parts (train:50% and test:50%) and I don’t understand why I can’t split it with my code, it only works if I change it like the solution code:
# Divide number of rows in half and round. last_train_row = int(len(rand_df) / 2) # Select the first half and set as training set. # Select the second half and set as test set. train_df = rand_df.iloc[0:last_train_row] test_df = rand_df.iloc[last_train_row:]
What actually happened:
ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 1 is required.
It works with this code (the same solution):