Removing duplicate entries part one - Mobile Apps Profile Guided Project

in my removing duplicate entries part one, i am receiving name-error trace back and this was the code i entered

duplicate_apps =
unique_apps =

for app in android:
name = app[0]
if name in unique_apps:
duplicate_apps.append(name)
else:
unique_apps.append(name)

print(‘Number of duplicate apps:’, len(duplicate_apps))
print(’\n’)
print(‘Examples of duplicate apps:’, duplicate_apps[:15])

below are all the code i have entered from the start. i have tried to review it through the solution page but still persist
Project below–>

#MY FIRST PROJECT IN DATA SCINCE

this project is about a developer of free apps for googleplaystore and apps store and by so doing we dont make money directly from the apps except we have enough subscribers of this free apps

so our goal now is to do an analysis to know which of this free apps attract more subscribers

In [11]:

from csv import reader

opened_file = open(‘AppleStore.csv’)

read_file = reader(opened_file)

ios_file = list(read_file)

ios_header = ios_file[0]

ios = ios_file[1:]

opened_file = open(‘googleplaystore.csv’)

read_file = reader(opened_file)

android_file = list(read_file)

android_header = android_file[0]

android = android_file[1:]

now we would have to create a funtion to explore the data

In [8]:

def explore_data(dataset, start, end, rows_and_column=False):

dataset_slice = dataset[start:end]

for rows in dataset_slice:

print(rows)

print(’\n’)

if rows_and_column:

print(‘Number of rows:’, len(dataset))

print(‘Number of column:’, len(dataset[0]))

print(ios_header)

print(’\n’)

explore_data(ios, 0, 3, True)

[‘id’, ‘track_name’, ‘size_bytes’, ‘currency’, ‘price’, ‘rating_count_tot’, ‘rating_count_ver’, ‘user_rating’, ‘user_rating_ver’, ‘ver’, ‘cont_rating’, ‘prime_genre’, ‘sup_devices.num’, ‘ipadSc_urls.num’, ‘lang.num’, ‘vpp_lic’] [‘284882215’, ‘Facebook’, ‘389879808’, ‘USD’, ‘0.0’, ‘2974676’, ‘212’, ‘3.5’, ‘3.5’, ‘95.0’, ‘4+’, ‘Social Networking’, ‘37’, ‘1’, ‘29’, ‘1’] [‘389801252’, ‘Instagram’, ‘113954816’, ‘USD’, ‘0.0’, ‘2161558’, ‘1289’, ‘4.5’, ‘4.0’, ‘10.23’, ‘12+’, ‘Photo & Video’, ‘37’, ‘0’, ‘29’, ‘1’] [‘529479190’, ‘Clash of Clans’, ‘116476928’, ‘USD’, ‘0.0’, ‘2130805’, ‘579’, ‘4.5’, ‘4.5’, ‘9.24.12’, ‘9+’, ‘Games’, ‘38’, ‘5’, ‘18’, ‘1’] Number of rows: 7197 Number of column: 16

In [14]:

print(android[10472])

print(’\n’)

print(android_header)

print(’\n’)

explore_data(android, 0, 3, True)

[‘Life Made WI-Fi Touchscreen Photo Frame’, ‘1.9’, ‘19’, ‘3.0M’, ‘1,000+’, ‘Free’, ‘0’, ‘Everyone’, ‘’, ‘February 11, 2018’, ‘1.0.19’, ‘4.0 and up’] [‘App’, ‘Category’, ‘Rating’, ‘Reviews’, ‘Size’, ‘Installs’, ‘Type’, ‘Price’, ‘Content Rating’, ‘Genres’, ‘Last Updated’, ‘Current Ver’, ‘Android Ver’] [‘Photo Editor & Candy Camera & Grid & ScrapBook’, ‘ART_AND_DESIGN’, ‘4.1’, ‘159’, ‘19M’, ‘10,000+’, ‘Free’, ‘0’, ‘Everyone’, ‘Art & Design’, ‘January 7, 2018’, ‘1.0.0’, ‘4.0.3 and up’] [‘Coloring book moana’, ‘ART_AND_DESIGN’, ‘3.9’, ‘967’, ‘14M’, ‘500,000+’, ‘Free’, ‘0’, ‘Everyone’, ‘Art & Design;Pretend Play’, ‘January 15, 2018’, ‘2.0.0’, ‘4.0.3 and up’] [‘U Launcher Lite – FREE Live Cool Themes, Hide Apps’, ‘ART_AND_DESIGN’, ‘4.7’, ‘87510’, ‘8.7M’, ‘5,000,000+’, ‘Free’, ‘0’, ‘Everyone’, ‘Art & Design’, ‘August 1, 2018’, ‘1.2.4’, ‘4.0.3 and up’] Number of rows: 10841 Number of column: 13

In [15]:

print(len(android))

del android[10472]

print(len(android))

10841 10840

In [24]:

for app in android:

name = app[0]

if name == ‘Instagram’:

print(app)

[‘Instagram’, ‘SOCIAL’, ‘4.5’, ‘66577313’, ‘Varies with device’, ‘1,000,000,000+’, ‘Free’, ‘0’, ‘Teen’, ‘Social’, ‘July 31, 2018’, ‘Varies with device’, ‘Varies with device’] [‘Instagram’, ‘SOCIAL’, ‘4.5’, ‘66577446’, ‘Varies with device’, ‘1,000,000,000+’, ‘Free’, ‘0’, ‘Teen’, ‘Social’, ‘July 31, 2018’, ‘Varies with device’, ‘Varies with device’] [‘Instagram’, ‘SOCIAL’, ‘4.5’, ‘66577313’, ‘Varies with device’, ‘1,000,000,000+’, ‘Free’, ‘0’, ‘Teen’, ‘Social’, ‘July 31, 2018’, ‘Varies with device’, ‘Varies with device’] [‘Instagram’, ‘SOCIAL’, ‘4.5’, ‘66509917’, ‘Varies with device’, ‘1,000,000,000+’, ‘Free’, ‘0’, ‘Teen’, ‘Social’, ‘July 31, 2018’, ‘Varies with device’, ‘Varies with device’]

In [4]:

duplicate_apps =

unique_apps =

for app in android:

name = app[0]

if name in unique_apps:

duplicate_apps.append(name)

else:

unique_apps.append(name)

print(‘Number of duplicate apps:’, len(duplicate_apps))

print(’\n’)

print(‘Examples of duplicate apps:’, duplicate_apps[:15])

NameErrorTraceback (most recent call last) in () 2 unique_apps = 3 ----> 4 for app in android: 5 name = app[0] 6 if name in unique_apps: NameError: name ‘android’ is not defined

i would not be cleaning the data randomly

what i would be doing is selecting the file that has the highest number of subscrption since our aim is to get the apps at mostly attract subscribers

You will have to rerun all your cells from the beginning.

alright then, let me try it out

1 Like