Predict car prices issues

Screen Link: Learn data science with Python and R projects

My Code:

def knn_train_test(train, target, df):
    shuffled_index = np.random.permutation(df.index)
    rand_df = df.reindex(shuffled_index)
    last_train_row = int(len(rand_df)/2)
    train_df = rand_df.iloc[0:last_train_row]
    test_df = rand_df.iloc[last_train_row:]
    k_value = [5]
    k_rmses = {}

    for k in k_value:
        knn = KNeighborsRegressor(n_neighbors = k)[[train]], train_df[target])
        predictions = knn.predict(test_df[[train]])

        mse = mean_squared_error(test_df[target], predictions)
        rmse = mse ** 0.5
        k_rmses[k] = rmse
    return k_rmses
k_rmse_results = {}

for n in range(2,7):
    k_rmse_results['{} best features'.format(n)] = knn_train_test(
        sorted_features[:n], 'price', numeric_cars)


What I expected to happen:

What actually happened:

TypeErrorTraceback (most recent call last)
<ipython-input-17-a9ec925a3809> in <module>()
     29 for n in range(2,7):
     30     k_rmse_results['{} best features'.format(n)] = knn_train_test(
---> 31         sorted_features[:n], 'price', numeric_cars)

<ipython-input-17-a9ec925a3809> in knn_train_test(train, target, df)
     15     for k in k_value:
     16         knn = KNeighborsRegressor(n_neighbors = k)
---> 17[[train]], train_df[target])
     19         predictions = knn.predict(test_df[[train]])

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/ in __getitem__(self, key)
   2131         if isinstance(key, (Series, np.ndarray, Index, list)):
   2132             # either boolean or fancy integer index
-> 2133             return self._getitem_array(key)
   2134         elif isinstance(key, DataFrame):
   2135             return self._getitem_frame(key)

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/ in _getitem_array(self, key)
   2175             return self._take(indexer, axis=0, convert=False)
   2176         else:
-> 2177             indexer = self.loc._convert_to_indexer(key, axis=1)
   2178             return self._take(indexer, axis=1, convert=True)

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/ in _convert_to_indexer(self, obj, axis, is_setter)
   1254                 # unique index
   1255                 if labels.is_unique:
-> 1256                     indexer = check = labels.get_indexer(objarr)
   1258                 # non-unique (dups)

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/indexes/ in get_indexer(self, target, method, limit, tolerance)
   2700                                  'backfill or nearest reindexing')
-> 2702             indexer = self._engine.get_indexer(target._values)
   2704         return _ensure_platform_int(indexer)

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_indexer()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.lookup()

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/indexes/ in __hash__(self)
   1720     def __hash__(self):
-> 1721         raise TypeError("unhashable type: %r" % type(self).__name__)
   1723     def __setitem__(self, key, value):

TypeError: unhashable type: 'Index'

I checked the index several times and can’t see anything wrong…

sorted_features[:2] would result in something like the following -

Index(['engine-size', 'horsepower'], dtype='object')

When trying to extract data from a DataFrame given an Index sequence like above, you don’t use double brackets.

So, you don’t use it as -

It would only be single brackets train_df[train].

That’s the source of the error. Updating that (and also the one in the next code line for predictions) should fix the issue.

But if you use only one feature as a string name column for pandas you must have or double brackets in the method because when you remain one bracket for one column method get series instead data frame and also raising the error.
The devil is in the detail

I understand that i should have double brackets for a string name column. but what do you mean ‘when you remain one bracket for one column method get series instead data frame and also raising the error’?

From what i understand, double brackets - string name column; single bracket - data extract from dataframe…

Single bracket using when the argument is lists of columns names more than one. If the argument is single columns name - use double brackets. When single brackets and one column’s name - pandas return series, not requred dataframe.