Working on guided project 1 and I tried a different approach to calculate number of duplicate apps in the android data set but I’m getting a different number to what’s in the solution and I don’t know what I’m doing wrong.
What I’m trying to do is create a frequency table of all the duplicate apps.
Here’s my code
unique = []
duplicated = {}
for row in g_play_data:
app_name = row[0]
if app_name in duplicated:
duplicated[app_name] += 1
elif app_name in unique:
duplicated[app_name] = 2
else:
unique.append(app_name)
print(len(duplicated))
What I expected to happen:
To get an answer of 1181 duplicate apps
What actually happened:
The output of printing the length of the duplicate dictionary is 798
I think it is because you are using duplicated as a dictionary, you are basically collecting the unique names of duplicate apps. For eg if ‘instagram’ is repeated, you will say instagram:2 and the key appears only once but if you are using a list and append the names, the number of times it will appear will be 2.
I hope this makes sense.
Maybe try adding the values of the dictionary and see if it matches the answer.
But the question is, wouldn’t it be easier to use 2 lists instead of dictionary?