Explanation of how this loop works

Hey!

I was wondering if someone could explain to me how the code below works. It is from the first guided project in python fundamentals. The code works just fine, I got that from the mission instructions. However, I just can’t comprehend how it actually works and why. Any explanation would be amazing!

reviews_max = {}

for app in google_data:
name = app[0]
n_reviews = float(app[3])

if name in reviews_max and reviews_max[name] < n_reviews:
    reviews_max[name] = n_reviews
    
elif name is not reviews_max:
    reviews_max[name] = n_reviews
2 Likes

Hi @rushtondavid23

Here what you have to understand first is that why we are doing this step.

So we have some multiple entries of apps in our dataset. It happened because the data was collected over a period of time and hence a few app names have been repeated. But there are a few changes in the data. The latest duplicate app data has more reviews as more people would have written a review for the more recent version of the data.

So we are going to keep the latest version of duplicate data and remove all other rows of data with the same app name.

In order to do that we need to get the name of the app and number of reviews for that app. That is what happening here

for app in google_data:
name = app[0]
n_reviews = float(app[3])

So the for loop, iterates through each row in google_data and returns a row. We call that row app.
Now, the name of the app is at 0 index of this row. which is app[0]
and similarly number of reviews is at 3 (4th column) hence n_reviews = float(app[3])

Now, let us look at the empty dictionary that we have created to store the name and number of reviews of the latest app. reviews_max = {}

So when this line executes inside the for loop
if name in reviews_max and reviews_max[name] < n_reviews:
it check if the name of the app in the first row is in the dictionary. Since it is an empty dictionary during the first iteration, it will return false and skips to the next condition which is elif name is not reviews_max:

Since you can’t find the first app name in an empty dictionary, this condition satisfies and hence the next line gets executed. reviews_max[name] = n_reviews. This stores the key:Value pair in the dictionary as name:n_reviews

In the next iteration if the fist condition if name in reviews_max is satisfied, ie if it finds a duplicate app, it then goes to the next condition ie reviews_max[name] < n_reviews. If it finds that the current row has a higher number of reviews than the previous value stored in the dictionary, the nested code ie reviews_max[name] = n_reviews gets executed and the name:n_reviews gets updated.

This process continues and at the end the final dictionary will have only one entry for a certain app name and its highest number of review.

I hope the logic was clear to you.

2 Likes

That breakdown helps massively!

Thanks. I get it much better now.

Thanks for taking the time to do that.

2 Likes