Guided Project 2: Question regarding Duplicate apps?

Alright Second guided project done. :blush: Selebration!

But there is one thing bugging me about this project that I ran into while doing it. This one question about a certain part of project. On page six of the lesson for the project ,after we find and remove duplicate entries for apps in the Google Play dataset, it says in the guide that there are no duplicates for the Apple Apps - and says that we can check that using the ID column. Why are we using the ID column for that? I did a check for duplicates for Apple Apps in my project along side the Google Apps using the App Names Column and found that there were duplicates for ‘Mannequin Challenge’ and ‘VR Roller Coaster’ apps. So from that I know that there are multiples of those apps are in the Apple Apps Store Dataset, who do we not care about that?

Here the code I used for my dupe check:

def dupe_check(data_set, name_col_num):
    unique_apps = []
    duplicate_apps = []
    for app in data_set:
        name = app[name_col_num]
        if name in unique_apps:
    return unique_apps, duplicate_apps

apple_uniq_apps, apple_dupe_apps = dupe_check(apps_data, 1)
google_uniq_apps, google_dupe_apps = dupe_check(play_store_data, 0)


[‘Mannequin Challenge’, ‘VR Roller Coaster’]

[‘Quick PDF Scanner + OCR FREE’, ‘Box’, ‘Google My Business’, ‘ZOOM Cloud Meetings’, ‘ - Simple Meetings’]

Guided Project: Profitable App Profiles for the App Store and Google Play Markets Link

Guided Project_ Profitable App Profiles for the App Store and Google Play Markets.tar (2.0 MB)
Basics.ipynb (45.5 KB)

Click here to view the jupyter notebook file in a new tab

1 Like

Hi @ilondire05

Welcome to DataQuest community. Great to see that you’re playing around with the guided project. This question you have is kind of a common question for those who dig a little bit deeper.

You can find the solution to your question in this thread.