Guided Project: Profitable App Profiles for the App Store and Google Play Markets: **remove duplicates**

my_dataquest_notebook from the site

There is something different in comparison with the solution notebook.
The unique number of entries should be: 9659 (it coresponds with the solutions from this point)

BUT when I investigate further some apps I discover that there are some apps like the ones bellow… for which the MAX filtering does NOT work anymore, because the MAX number of rating appears at least once :slight_smile: :

def check_apps(nameApp, dataset):
    for eachApp in dataset[1:]:
        name = eachApp[0]
        if name == nameApp:
            print(eachApp)
    
print (check_apps('Quick PDF Scanner + OCR FREE', android))
print('\n')
print (check_apps('Google My Business', android))

In this case after running “the cleaning of the duplicates”:

android_cleaning = []
existing_ones = []

for eachApp in android[1:]:
    name = eachApp[0]
    n_reviews = float(eachApp[3])
    
    if (reviews_max[name] == n_reviews) and (name not in existing_ones):
        android_cleaning.append(eachApp)
        existing_ones.append(eachApp)

explore_data(android_cleaning, 0, 3, True)
print(len(existing_ones))
print(len(android_cleaning))

I obtain a number of 10054, which is still containing some duplicates (395):

def check_duplicates(dataset):
    unique_AppNames = []
    duplicate_AppNames = []

    for eachApp in dataset[1:]:
        app_name = eachApp[0]
        if(app_name in unique_AppNames):
            duplicate_AppNames.append(app_name)
        
        else:
            unique_AppNames.append(app_name)

    print("Number of duplicate apps:", len(duplicate_AppNames))
    print('\n')
    print("Examples of duplicate apps:", duplicate_AppNames[:30])
    print('\n')
    
check_duplicates(android)
check_duplicates(android_cleaning)

I need some confirmation about this or if someone else has encountered this situation, because I wold need to eliminate those since I’ve notices they are identical with respect to all vars. Or maybe I did something wrong .

This is a clear image of how this criterion does not work for some duplicated apps:

print (check_apps('Quick PDF Scanner + OCR FREE', android_cleaning ))
print (check_apps('Google My Business', android_cleaning))

print('\n')
print (check_apps('Quick PDF Scanner + OCR FREE', android))
print (check_apps('Google My Business', android))

I realized my mistake:

android_cleaning = []
existing_ones = []

for eachApp in android[1:]:
    name = eachApp[0]
    n_reviews = float(eachApp[3])
    
    if (reviews_max[name] == n_reviews) and (name not in existing_ones):
        android_cleaning.append(eachApp)
        existing_ones.append(**name**) ---- > name and not eachApp here

Thank you anyway :slight_smile:

I’m glad you figured it out and that you shared your own solution and process! (You can mark your last post as the solution too, and other students can benefit.)

There was a post just started today about those coding moments that cause frustration, and this is a good example of that. :slight_smile:

Good luck on the rest of your project!