Guided Project Analyzing App Profiles AppleStore and Google PLayStore

My Code:

def duplicate_finder(dataset, index):
    unique_apps = []
    duplicate_apps = []
    for row in dataset:
        app_name = row[index]
        if app_name in unique_apps:
            duplicate_apps.append(app_name)
        else:
            unique_apps.append(app_name)
    return duplicate_apps

google_duplicate = duplicate_finder(google_dataset, 0)
print(len(google_duplicate))

apple_duplicate = duplicate_finder(apple_dataset, 1)
print(len(apple_duplicate))

What I expected to happen:
The number of duplicate apps for Google PlayStore should be 1181.
The number of duplicates apps for AppleStore should be zero.

What actually happened:
Number of Google PlayStore duplicates is 1102.
Number of AppleStore duplicate apps is 2.

1102
2

However, when the code is written outside a function, something interesting happens.
Number of Google PlayStore duplicates is 1102
Number of AppleStore duplicate apps is ZERO

unique_apps = []
duplicate_apps = []
for row in apple_dataset:
    app_name = row[0]
    if app_name in unique_apps:
        duplicate_apps.append(app_name)
    else:
        unique_apps.append(app_name)
print(len(duplicate_apps))

Output:

0

Please explain:

  1. Why the number of duplicates for Google PlayStore is 1102 and not 1181.
  2. Why number of duplicates for AppleStore is 2 when the code is written inside a function but the number of duplicates is zero when the code i written outside the function.

Thank you!

It would be better if you also added the link to the Mission Step in your post. Helps to easily access it and allows others to help you faster depending on the question.

Thanks for the update.
The mission step is 4 in the Guided Project: Profitable App Profiles for the App Store and Google Play Markets.

Here is the link directly to the mission step 4:
https://app.dataquest.io/m/350/guided-project%3A-profitable-app-profiles-for-the-app-store-and-google-play-markets/4/removing-duplicate-entries-part-one

Because you are using index 0 in your app_name = row[0] instead of the index 1.

Your code has no issue. I ran it and get the length as 1181. So, before you ran this code you made some changes to the dataset perhaps that lead to a different result. I would recommend making sure you haven’t made any changes before this code, and that you are running the code cells in the correct order.

Thanks for your quick reply. The issue was resolved once I used the raw file from the direct link in the solution and deleted the one provided in the mission step 1 of the guided project. Please note that no edits were made in the raw file when originally downloaded from the link in the mission step 1 of the guided project.

It seems there are different versions of csv file for Google PlayStore.

  • The one in the mission step 1.
  • The one in solution with direct link.

I say this because when I downloaded the csv file for Google PlayStore from solution manual’s direct link, I got accurate results. However, the issue with incorrect number of duplicate apps persisted with the csv file downloaded from the link in the mission step 1.

Anyways, thanks for the quick reply. It was helpful.

Index 0 is first column that contains the names of apps for AppleStore.csv file and index 1 is the second column that contains the names of apps for googleplaystore.csv. I used index 1 to find the duplicate apps in googleplaystore.csv file.