I am trying to find the number of duplicate apps in the Google Play Store file. I went in a roundabout manner with my logic and coding. I am not getting the right answer. Can you help me figure out where I made a mistake?
My code is as follows:
### Finding duplicate entries in Google Play Store data ###
# First finding if duplicates exist
list1 = []
for row in android:
list1.append(row[0]) #pulled all the app names into a separate list
print("The total number of app names is:",len(list1))
apps_android = {}
apps_duplicate = [] #creating a list of all apps that have duplicate entries
for aname in list1:
if aname in apps_android:
apps_android[aname] += 1
else:
apps_android[aname] = 1
print("The total number of apps in apps_android is:",len(apps_android))
for each in apps_android:
if apps_android[each] > 1:
apps_duplicate.append(each) #for any app with more than one listing, name gets added to the list
print("The total number of duplicate apps are:",len(apps_duplicate))
I am facing an error with the numbers. The results I get are as follows:
The total number of app names is: 10841
The total number of apps in apps_android is: 9660
The total number of duplicate apps are: 798
I am unable to figure out where I am making an error. The total number of apps (9660) is the number of unique apps in the actual solution. However, the sum of total apps and duplicate apps does not add up to 10841.