Guided Project: Profitable App Profiles Python Question

Section: https://app.dataquest.io/m/350/guided-project%3A-profitable-app-profiles-for-the-app-store-and-google-play-markets/5/removing-duplicate-entries-part-two

Can somebody explain to me, step-by-step how this code is working?

reviews_max = {}

for app in android:
name = app[0]
n_reviews = float(app[3])

if name in reviews_max and reviews_max[name] < n_reviews:
    reviews_max[name] = n_reviews
    
elif name not in reviews_max:
    reviews_max[name] = n_reviews

android_clean =
already_added =

for app in android:
name = app[0]
n_reviews = float(app[3])

if (reviews_max[name] == n_reviews) and (name not in already_added):
    android_clean.append(app)
    already_added.append(name) 

reviews_max {} is a dictionary.

Shouldn’t it be like reviews_max{name, n_review}, etc ,etc as the dictionary? Or did I not get my understanding of dictionary correct.

I also don’t understand what’s going on in:

if (reviews_max[name] == n_reviews) and (name not in already_added):
android_clean.append(app)
already_added.append(name)

1 Like

Hi @scchoi31,

Let’s try to reproduce the same situation. But before that, let us understand what we are trying to achieve. We have a dictionary from our previous code that contains all the app names and the max reviews that the app got. So first, let’s create that dictionary.

reviews_max = {
        'Facebook': 100000,
        'Gmail': 80000,
        'Instagram': 90000,
        'PUBG': 90000
        }

Now what we are trying to do is, we are using this dictionary to create a new list that doesn’t contain duplicate values. And we must make sure that we keep the row with the max reviews. So how do we do that? Let’s create a dataset so that we can decide how to solve it.

dataset = [['Facebook', 100000],
           ['Facebook', 100000],
           ['Instagram', 85000],
           ['Facebook', 98000],
           ['PUBG', 22000],
           ['Instagram', 90000],
           ['PUBG', 90000],
           ['Gmail', 80000]]

Okay, now let’s think about a way we can remove duplicates by keeping the row with max reviews. The first thing for sure we need to loop through it, and we need an empty list to which we can add rows with max reviews. So let’s add an empty list and a for loop.

new_list = []
for row in dataset:

What shall we do for each row? For each row, we want to check whether the review in that row matches the max reviews in our dictionary.

new_list = []
for row in dataset:
    name = row[0]
    reviews = row[1]
    if reviews == reviews_max[name]:

Okay, so the condition is ready. Now we need to add our app to the new list if it meets the condition.

new_list = []
for row in dataset:
    name = row[0]
    reviews = row[1]
    if reviews == reviews_max[name]:
        new_list.append(row)

It seems like our code is ready; let’s give it a run.

print(new_list)
[['Facebook', 100000], ['Facebook', 100000], ['Instagram', 90000], ['PUBG', 90000], ['Gmail', 80000]]

So now, we have all the apps that have maximum reviews. But there is a problem. Since Facebook had two rows that have the same value as the value in the reviews_max dictionary, both of these rows got appended to the new list. How to avoid that? One way to prevent that is to make sure we don’t add the app to the new_list if we already added it. But how to keep track of it? We can create an added list to which we will add app names whenever we add a row to the new_list. And we will only add new rows to the added list if it doesn’t already have that app name. For doing this, we need to create another condition. Let’s do that.

new_list = []
added = []
for row in dataset:
    name = row[0]
    reviews = row[1]
    if reviews == reviews_max[name]:
        if name not in added:
            new_list.append(row)
            added.append(name)

Okay, so now our code is ready. Let’s give it a run and see whether it succeeds in doing what we wanted.

print(new_list)
[['Facebook', 100000], ['Instagram', 90000], ['PUBG', 90000], ['Gmail', 80000]]

Problem solved. We only have one Facebook row as we were expecting. Now instead of nesting an if condition inside another, We can combine both conditions using the and operator. So let’s do that!

new_list = []
added = []
for row in dataset:
    name = row[0]
    reviews = row[1]
    if (reviews == reviews_max[name]) and (name not in added):
        new_list.append(row)
        added.append(name)

I hope this has helped you understand it. If I misinterpreted your actual query, let me know. I would be happy to help you. :slightly_smiling_face:

Also, if you would like to know about the syntax of the Python dictionary, I would recommend you to check out this post:

Best,
Sahil

8 Likes

Absolutely brilliant explanation