Predicting House Sales Pricies

Screen Link: https://github.com/dataquestio/solutions/blob/master/Mission240Solutions.ipynb

When I get to the feature selection part to calculate the correlated coefficients in the SalesPrice column, I get this error. Not sure why?

My Code:
abs_corr_coeffs = numerical_df.corr()['SalePrice'].abs().sort_values() abs_corr_coeffs

    ---------------------------------------------------------------------------
    KeyError                                  Traceback (most recent call last)
    ~\Anaconda3\Anaconda\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
       3077             try:
    -> 3078                 return self._engine.get_loc(key)
       3079             except KeyError:

    pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

    pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

    pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

    pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

    KeyError: 'SalePrice'

    During handling of the above exception, another exception occurred:

    KeyError                                  Traceback (most recent call last)
    <ipython-input-40-fef770f21a34> in <module>()
    ----> 1 abs_corr_coeffs = numerical_df.corr()['SalePrice'].abs().sort_values()
          2 abs_corr_coeffs

    ~\Anaconda3\Anaconda\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
       2686             return self._getitem_multilevel(key)
       2687         else:
    -> 2688             return self._getitem_column(key)
       2689 
       2690     def _getitem_column(self, key):

    ~\Anaconda3\Anaconda\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
       2693         # get column
       2694         if self.columns.is_unique:
    -> 2695             return self._get_item_cache(key)
       2696 
       2697         # duplicate columns & possible reduce dimensionality

    ~\Anaconda3\Anaconda\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
       2487         res = cache.get(item)
       2488         if res is None:
    -> 2489             values = self._data.get(item)
       2490             res = self._box_item_values(item, values)
       2491             cache[item] = res

    ~\Anaconda3\Anaconda\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
       4113 
       4114             if not isna(item):
    -> 4115                 loc = self.items.get_loc(item)
       4116             else:
       4117                 indexer = np.arange(len(self.items))[isna(self.items)]

    ~\Anaconda3\Anaconda\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
       3078                 return self._engine.get_loc(key)
       3079             except KeyError:
    -> 3080                 return self._engine.get_loc(self._maybe_cast_indexer(key))
       3081 
       3082         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

    pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

    pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

    pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

    pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

    KeyError: 'SalePrice'

I expected to get:
BsmtFin SF 2 0.006127
Misc Val 0.019273
Yr Sold 0.030358
3Ssn Porch 0.032268
Bsmt Half Bath 0.035875
Low Qual Fin SF 0.037629
Pool Area 0.068438
MS SubClass 0.085128
Overall Cond 0.101540
Screen Porch 0.112280
Kitchen AbvGr 0.119760
Enclosed Porch 0.128685
Bedroom AbvGr 0.143916
Bsmt Unf SF 0.182751
Lot Area 0.267520
2nd Flr SF 0.269601
Bsmt Full Bath 0.276258
Half Bath 0.284871
Open Porch SF 0.316262
Wood Deck SF 0.328183
BsmtFin SF 1 0.439284
Fireplaces 0.474831
TotRms AbvGrd 0.498574
Mas Vnr Area 0.506983
Years Since Remod 0.534985
Full Bath 0.546118
Years Before Sale 0.558979
1st Flr SF 0.635185
Garage Area 0.641425
Total Bsmt SF 0.644012
Garage Cars 0.648361
Gr Liv Area 0.717596
Overall Qual 0.801206
SalePrice 1.000000
Name: SalePrice, dtype: float64

The key error in a dataframe occurs when the names of the given columns are not found in the dataframe. You must have delete or renamed column by mistake. You can confirm by printing column names using print(numerical_df.columns)

Here is my notebook:
Predicting House Sale Prices.ipynb (26.5 KB)

Looks like I’m doing the same thing as the solution (the SalesPrice column is dropped), but getting the error. Am I missing something?

Click here to view the jupyter notebook file in a new tab

hi @feedmyboxtoday

As suggested by @DishinGoyani the SalePrice column is missing from the numerical_df you have created with columns of only int and float dtypes.

Re-view the results of code cell 39 and 44.