Enumerate vs pandas.DataFrame.iterrows

Screen Link: https://app.dataquest.io/m/469/guided-project%3A-popular-data-science-questions/7/most-used-and-most-viewed

This is just an alternate way of handling the task Count how many times each tag was viewed.

The solution in the solution notebooks suggests using enumerate and this works - provided you have not played with the questions dataframe in any way that would change the ordering of the index or dropped any rows. The solution supplied depends on the result of enumerate(questions["Tags"]) matching up with index of the questions dataframe and this is not necessarily true. You can ensure the index matches the enumerate values by re-indexing just before you call enumerate or you can use pandas.DataFrame.iterrows which, because you are operating on the whole row, Allows you to extract the row['Tags'] and row['ViewCount'] from the same piece of data.

Code:

tag_views_dict = {}
for index, row in questions.iterrows():
    for tag in row['Tags']:
        if tag in tag_views_dict:
            tag_views_dict[tag] += row['ViewCount']
        else:
            tag_views_dict[tag] = row['ViewCount']
4 Likes

True. I’ll consider making this modification.

Thank you for the feedback!

1 Like

Hello @thughes and @Bruno,

I don’t get the all logic behind the code. From the pandas.DataFrame.iterrows documentation, it is mentioned that: “Iterate over DataFrame rows as (index, Series) pairs.” Why should we use iterrows when we already use for loops?
Could anyone explain it further to me please?
pop DS question

Thank you in advance.

1 Like

Answer:

3 Likes