Counting duplicate apps in android data set, getting different value

Working on guided project 1 and I tried a different approach to calculate number of duplicate apps in the android data set but I’m getting a different number to what’s in the solution and I don’t know what I’m doing wrong.

What I’m trying to do is create a frequency table of all the duplicate apps.

Here’s my code

unique = []
duplicated = {}

for row in g_play_data:
    app_name = row[0]
    if app_name in duplicated:
        duplicated[app_name] += 1
    elif app_name in unique:
         duplicated[app_name] = 2
    else:
         unique.append(app_name)

print(len(duplicated))

What I expected to happen:
To get an answer of 1181 duplicate apps

What actually happened:
The output of printing the length of the duplicate dictionary is 798

Help :slight_smile:

Hi @judita, your code is not that clear but to fix the problem do the following;

  • can you try using the if-else instead of the if-elif-else

have a look

unique = []
duplicated = {}

for row in g_play_data:
    app_name = row[0]
    if app_name in unique:
        duplicated[app_name] += 1
    else:
         unique.append(app_name)

print(len(duplicated))

but then you’re not adding the key to the dictionary if the app_name is in “unique” but not yet in the dictionary “duplicated”?

which is what this line is doing:

elif app_name in unique:
    duplicated[app_name] = 2

HI @judita,

I think it is because you are using duplicated as a dictionary, you are basically collecting the unique names of duplicate apps. For eg if ‘instagram’ is repeated, you will say instagram:2 and the key appears only once but if you are using a list and append the names, the number of times it will appear will be 2.

I hope this makes sense.

Maybe try adding the values of the dictionary and see if it matches the answer.

But the question is, wouldn’t it be easier to use 2 lists instead of dictionary?