Pd.iterrows() --what is the usage?

I dont understand the usage of pd.iterrows(). Can you help explain the code? Thanks

Screen Link: https://app.dataquest.io/m/469/guided-project%3A-popular-data-science-questions/8/relations-between-tags

My Code:


for index, row in questions.iterrows():
    for tag in row['Tags']:
        if tag in tag_view_count:
            tag_view_count[tag] += row['ViewCount']
        else:
            tag_view_count[tag] = row['ViewCount']
            
tag_view_count = pd.DataFrame.from_dict(tag_view_count, orient="index")
tag_view_count.rename(columns={0: "ViewCount"}, inplace=True)

most_viewed = tag_view_count.sort_values(by="ViewCount").tail(20)

most_viewed.plot(kind="barh", figsize=(16,8))

You can check my response to another question asking about iterrorows() - Guided Project: Building a Spam Filter with Naive Bayes itterows()?

It’s not the same content, but you should be able to get the idea of it.

I would also recommend that you refer to the documentation of these functions more often as they often answer most questions of how they work and also the documentation usually includes examples to show how to use them.

pandas.DataFrame.iterrows

DataFrame. iterrows ()

Iterate over DataFrame rows as (index, Series) pairs.

Yields=>

index: label or tuple of label => The index of the row. A tuple for a MultiIndex.

data: Series => The data of the row as a Series.

it: generator => A generator that iterates over the rows of the frame.

for index, row in questions.iterrows():# Iterating over question dataframe rows as index, series pairs with two iteration variable index and row.

for tag in row['Tags']:
In this inner for loop iterating over each row’s “Tags” labeled coulumn
you are making a frequency dictionary with the key “tag” and its corresponding view counts with the condition if the tag is not already exists in that tag_view dictionary that is supposed to be empty initially as you defined it, you will assign the value as row[“ViewCounts”] or else the tag exits already there, that means you are getting it repeatedly , so you are adding extra view_count with the existed value like
tag_view_count[tag] += row['ViewCount']. If I change your code a little bit, it would be more understandable, I think.

     if tag not in tag_view_count:
              tag_view_count[tag] = row['ViewCount']
     else:
              tag_view_count[tag] += row['ViewCount']

tag_view_count = pd.DataFrame.from_dict(tag_view_count, orient="index")
You are converting your dictionary into a DataFrame and with orient = “index” you are saying your keys of “tag_view_count” dictionary should be the index of the dataframe instead of columns, because it is by default columns.
tag_view_count.rename(columns={0: "ViewCount"}, inplace=True)
You are renaming the column’s name or label from 0 to “ViewCount” and your inplace = True (by default it’s False)means you want to make change in the original dataframe, it will be not a copy.

most_viewed = tag_view_count.sort_values(by="ViewCount").tail(20)
You are sorting the value of tag_view_count in by default ascending order (small to large) according to ViewCount and want to see the last 20 values and assigning it to a variable named most_viewed.

most_viewed.plot(kind="barh", figsize=(16,8))
You are telling to plot a horizontal bar chart by kind = "barh"and you are defining the size of your bar chart figure that its width = 16 inches and height = 8 inches.
#@ candiceliu93 If you find it useful, please mark my answer as solution.