What was wrong with the way I coded it originally?

Screen Link:
https://app.dataquest.io/m/144/bar-plots-and-scatter-plots/2/introduction-to-the-data

My Code:

import pandas as pd

reviews = pd.read_csv('fandango_scores.csv')

norm_reviews = reviews['FILM', 'RT_user_norm', 'Metacritic_user_nom', 'IMDB_norm', 'Fandango_Ratingvalue', 'Fandango_Stars']

print(norm_reviews[:1])

What I expected to happen:
I expected the code to run. Maybe I didn’t fully internalize some of the material on a previous mission.

Why do I need to create a separate variable called cols instead of just “cutting out the middle-man”, so to speak, and replacing the value for the cols variable directly into the code?

Additionally, could someone please tell me which mission will help me review this topic?

What actually happened:

KeyErrorTraceback (most recent call last)
/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2524             try:
-> 2525                 return self._engine.get_loc(key)
   2526             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: ('FILM', 'RT_user_norm', 'Metacritic_user_nom', 'IMDB_norm', 'Fandango_Ratingvalue', 'Fandango_Stars')

During handling of the above exception, another exception occurred:

KeyErrorTraceback (most recent call last)
<ipython-input-1-10d2b84cb9b7> in <module>()
      3 reviews = pd.read_csv('fandango_scores.csv')
      4 
----> 5 norm_reviews = reviews['FILM', 'RT_user_norm', 'Metacritic_user_nom', 'IMDB_norm', 'Fandango_Ratingvalue', 'Fandango_Stars']
      6 
      7 print(norm_reviews[:1])

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2137             return self._getitem_multilevel(key)
   2138         else:
-> 2139             return self._getitem_column(key)
   2140 
   2141     def _getitem_column(self, key):

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/frame.py in _getitem_column(self, key)
   2144         # get column
   2145         if self.columns.is_unique:
-> 2146             return self._get_item_cache(key)
   2147 
   2148         # duplicate columns & possible reduce dimensionality

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
   1840         res = cache.get(item)
   1841         if res is None:
-> 1842             values = self._data.get(item)
   1843             res = self._box_item_values(item, values)
   1844             cache[item] = res

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/internals.py in get(self, item, fastpath)
   3841 
   3842             if not isna(item):
-> 3843                 loc = self.items.get_loc(item)
   3844             else:
   3845                 indexer = np.arange(len(self.items))[isna(self.items)]

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2525                 return self._engine.get_loc(key)
   2526             except KeyError:
-> 2527                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2528 
   2529         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: ('FILM', 'RT_user_norm', 'Metacritic_user_nom', 'IMDB_norm', 'Fandango_Ratingvalue', 'Fandango_Stars')

Hey nilssonjosh,

You are just forgetting to put double brackets when creating the variable norm_review. This is necessary because you’re trying to create a new Dataframe. Single bracket would just refer to a Series. It should be like this:

norm_reviews = reviews[['FILM', 'RT_user_norm', 'Metacritic_user_nom', 'IMDB_norm', 'Fandango_Ratingvalue', 'Fandango_Stars']]

About creating a variable called cols, that’s not necessary. It is just a way to make your code more readable for others. Additionaly, if you are having difficult on while using single or double brackets, I would recommend you to make a quick review on the first course of step 2, which is Pandas and Numpy Fundamentals.

I hope my explanation helped you. If so, please, mark it as solution.

3 Likes

Hello @nilssonjosh, welcome to the community!

You don’t need to create a separate variable, but keep in mind that when you want to select a list of columns from a DataFrame, you must pass a list inside the []. So, in order not to create another variable, you need to use the following syntax:

norm_reviews = reviews[['FILM', 'RT_user_norm', 'Metacritic_user_nom', 'IMDB_norm', 'Fandango_Ratingvalue', 'Fandango_Stars']]

Notice that I put a list inside reviews[], which is the same as creating a list named cols and run the following code:

norm_reviews = reviews[cols]
3 Likes

Very helpful! Thank you!

1 Like

Thanks for pointing me in the right direction!

1 Like

nilssonjosh Anytime :slight_smile: