Stuck on First Project

Screen Link: https://app.dataquest.io/m/350/guided-project%3A-profitable-app-profiles-for-the-app-store-and-google-play-markets/5/removing-duplicate-entries-part-two

On question 1 within Part 5 (Removing Duplicate Entries Part 2) of the “Guided Project: Profitable App Profiles for the App Store and Google Play Markets,” I am having trouble what code to use. The question being asked is:

  • If name already exists as a key in the reviews_max dictionary and reviews_max[name] < n_reviews, update the number of reviews for that entry in the reviews_max dictionary.
  • If name is not in the reviews_max dictionary as a key, create a new entry in the dictionary where the key is the app name, and the value is the number of reviews. Make sure you don’t use an else clause here, otherwise the number of reviews will be incorrectly updated whenever reviews_max[name] < n_reviews evaluates to False.

The bold sentence above is what has me confused. It says not to use the else clause, but the solution has it being used below:

The solution is displayed as:

reviews_max = {}
for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

Could some of you share how you did this part? Is there more than one way to approach this? I appreciate your patience with my novice questions but thats why we are here right?

-troc25

1 Like

Because if you use:

else:
     reviews_max[name] = n_reviews

The else statement will work also if reviews_max[name] < n_reviews is False and you don’t want this.

3 Likes

Hi @troc25,

If you look carefully, you will see the solution has elif aka else if,
Not else and that makes the difference.

So

if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews

the code above checks if the app name is in the review_max dictionary and if the updated value of n_review from n_reviews = float(app[3]) is greater than the current value connected with the name in the dictionary.

If it finds the app name already in the dictionary and finds a greater number of reviews, it then updates the dictionary value with a greater number of reviews by executing reviews_max[name] = n_reviews

If we use else, whenever any of the condtions in if name in reviews_max and reviews_max[name] < n_reviews: becomes false it executes reviews_max[name] = n_reviews and thus it adds a new values to the dictionary with these values. Hence it will still create multiple duplicate entries.

But since you are using

elif name not in reviews_max:
        reviews_max[name] = n_reviews

this will only be executed if the app name is not in the dictionary.

Did this clarify your doubt?

4 Likes

Yes it does. Oh man do I feel like a potato. Thank you so much. Sometimes you just need another set of eyes to look at something. Thanks for your clarity and feedback.

1 Like

Thank you. I was staring right at it the whole time.

2 Likes

Hi,

I didn’t use elif, I just used another if so my code was:

#Creating new dataset with duplicates removed
reviews_max = {}

for app in android[0:]:
name = app[0]
n_reviews = float(app[3])
reviews_max[name]=n_reviews
if name in reviews_max and reviews_max[name] < n_reviews:
reviews_max[name]=n_reviews
if name not in reviews_max:
reviews_max[name]=n_reviews
print(‘Expected length’, len(reviews_max))

It seemed to give me the same result, why do we need to use elif?

1 Like

Hi @jdj,

I can see that it would have worked in this case, but it wouldn’t be an efficient way to do it. If you are applying if - elif, the second condition doesn’t have to run if the first if condition is satisfied.

In your case, both the conditions have to be checked in each iteration irrespectie of their output.

2 Likes