Alright Second guided project done. Selebration!
But there is one thing bugging me about this project that I ran into while doing it. This one question about a certain part of project. On page six of the lesson for the project ,after we find and remove duplicate entries for apps in the Google Play dataset, it says in the guide that there are no duplicates for the Apple Apps - and says that we can check that using the ID column. Why are we using the ID column for that? I did a check for duplicates for Apple Apps in my project along side the Google Apps using the App Names Column and found that there were duplicates for ‘Mannequin Challenge’ and ‘VR Roller Coaster’ apps. So from that I know that there are multiples of those apps are in the Apple Apps Store Dataset, who do we not care about that?
Here the code I used for my dupe check:
def dupe_check(data_set, name_col_num): unique_apps =  duplicate_apps =  for app in data_set: name = app[name_col_num] if name in unique_apps: duplicate_apps.append(name) else: unique_apps.append(name) return unique_apps, duplicate_apps apple_uniq_apps, apple_dupe_apps = dupe_check(apps_data, 1) google_uniq_apps, google_dupe_apps = dupe_check(play_store_data, 0) print(apple_dupe_apps) print(len(apple_dupe_apps)) print('\n') print(google_dupe_apps[:5]) print(len(google_dupe_apps))
[‘Mannequin Challenge’, ‘VR Roller Coaster’]
[‘Quick PDF Scanner + OCR FREE’, ‘Box’, ‘Google My Business’, ‘ZOOM Cloud Meetings’, ‘join.me - Simple Meetings’]
Guided Project_ Profitable App Profiles for the App Store and Google Play Markets.tar (2.0 MB)
Basics.ipynb (45.5 KB)