Hello @greta.meroni, welcome to DQ community!
To find the duplicated apps names in the Applestore dataset, use the code below:
duplicate = []
unique = []
for app in ios: ##note that ios is the name of the variable holding my Applestore dataset
name = app[1]
if name not in unique:
unique.append(name)
else:
duplicate.append(name)
print("Number of duplicate apps: ", len(duplicate))
print('\n')
print("Examples of duplicate apps: ", duplicate[:10])
Output:
Number of duplicate apps: 2
Examples of duplicate apps: ['Mannequin Challenge', 'VR Roller Coaster']
To find out which row in your dataset contains the apps with the duplicate name of ''Mannequin Challenge'
:
for app in ios: ##note that ios is the name of the variable holding my Applestore dataset
name = app[1]
if name == "Mannequin Challenge":
print(app)
Output:
['1173990889', 'Mannequin Challenge', '109705216', 'USD', '0.0', '668', '87', '3.0', '3.0', '1.4', '9+', 'Games', '37', '4', '1', '1']
['1178454060', 'Mannequin Challenge', '59572224', 'USD', '0.0', '105', '58', '4.0', '4.5', '1.0.1', '4+', 'Games', '38', '5', '1', '1']
You can do the same for 'VR Roller Coaster'
by replacing "Mannequin Challenge
in the code above with 'VR Roller Coaster'
. You can then check through the rows to read all the characteristics of that app to verify that they are not actual duplicates.
Let me know if this answers your questions.