Blue Week Special Offer | Brighten your week!
days
hours
minutes
seconds

GP Popular DS Questions. Key errors in a nested for loop to find Tags ViewCount

Screen Link: Learn data science with Python and R projects

So, here I already have a tags dataframe that contains single Tags as indexes and the n of times they were used as a column.
Now I am trying to add the amount of views for each of those tags directly as a column to the tags dataframe. But there seems to be a mistake somewhere

My Code:

tags['views'] = 0
for tag in tags:
    for row in questions:
        if tag in row:
            tags.loc[tag]['views'] += questions.loc[row]['ViewCount']
        else:
            tags.loc[tag]['views'] = tags.loc[tag]['views']

What I expected to happen:

A nice “views” column with the sum of views for each tag.

What actually happened:

KeyError                                  Traceback (most recent call last)
/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3079             try:
-> 3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:

#<more error info here>

KeyError: 'count'

I understand there must be some mess with the .loc indices but I am not sure what to google to find out what it is. Please, help.

Can you share your notebook with this error?

Here is the project I ended up with Popular Data Science Questions on Stack Exchange.ipynb (133.6 KB)
I now see that there is a problem with the loop: the tag is not in a row, but in a row[“Tags”] column. So, maybe that was the issue.
I looked at the solution and copied some of the decisions from there. What I still don’t totally understand is the difference that df.iterrows thing makes. I thought that a for loop looped over rows by default. Why is it not the case here?

Click here to view the jupyter notebook file in a new tab

Iterating over a dataframe iterates over its columns:

>> import seaborn as sns
>>> planets = sns.load_dataset("planets")
>>> df = planets.head()
>>> df
            method  number  orbital_period   mass  distance  year
0  Radial Velocity       1         269.300   7.10     77.40  2006
1  Radial Velocity       1         874.774   2.21     56.95  2008
2  Radial Velocity       1         763.000   2.60     19.84  2011
3  Radial Velocity       1         326.030  19.40    110.62  2007
4  Radial Velocity       1         516.220  10.50    119.47  2009

>>> for x in df: print(type(x), x)
... 
<class 'str'> method
<class 'str'> number
<class 'str'> orbital_period
<class 'str'> mass
<class 'str'> distance
<class 'str'> year

However. . .

>>> for x in df.iterrows(): print(type(x), x)
... 
<class 'tuple'> (0, method            Radial Velocity
number                          1
orbital_period              269.3
mass                          7.1
distance                     77.4
year                         2006
Name: 0, dtype: object)
<class 'tuple'> (1, method            Radial Velocity
number                          1
orbital_period            874.774
mass                         2.21
distance                    56.95
year                         2008
Name: 1, dtype: object)
<class 'tuple'> (2, method            Radial Velocity
number                          1
orbital_period              763.0
mass                          2.6
distance                    19.84
year                         2011
Name: 2, dtype: object)
<class 'tuple'> (3, method            Radial Velocity
number                          1
orbital_period             326.03
mass                         19.4
distance                   110.62
year                         2007
Name: 3, dtype: object)
<class 'tuple'> (4, method            Radial Velocity
number                          1
orbital_period             516.22
mass                         10.5
distance                   119.47
year                         2009
Name: 4, dtype: object)
1 Like