I tried my own way to get duplicates but it's giving me wrong answer

Screen Link:

My Code:

appnames=[]
for row in googlefile:
    nameofapp=row[0]
    appnames.append(nameofapp)
print( appnames[:5])#some of the app names
print('\n')
print('-------------------------------------------------')
print('\n')


count={}
for app in appnames:
    if app in count:
        count[app]+=1
    else:
        count[app]=1


listofduplicates=[]
listofunique=[]
for key in count:
    if count[key]>1:
        listofduplicates.append(key)
    elif count[key]==1:
            listofunique.append(key)
print(len(listofduplicates))
print(len(listofunique))

What I expected to happen:

What actually happened: I’m not getting 1181 as my number of duplicates but instead 8862 and i don’t understand why.

This is my full output

[‘App’, ‘Photo Editor & Candy Camera & Grid & ScrapBook’, ‘Coloring book moana’, ‘U Launcher Lite – FREE Live Cool Themes, Hide Apps’, ‘Sketch - Draw & Paint’]


798
8862

1 Like

I think the trick was identifying which ones were duplicates and uniques. Then after you’ve identified these get the count once you’ve done that. Master @Bruno - help us out.

body_parts =  ['foot', 'hand', 'finger', 'toe', 'foot']

list_of_duplicates = []
list_of_uniques = []

for x in body_parts:
    if x not in list_of_uniques:
        list_of_uniques.append(x)
   else:
        list_of_duplicates.append(x)

len(list_of_duplicates)
len(list_of_uniques)
1 Like

hey @jordangarden55 and @eugeniosp3

I am not Bruno! I hope this helps you instead of confusing you more.

Jordan you have already segregated the duplicate entries in the count dictionary albeit it is the number of times each app is repeated in appnames list. if you sum count you get the total of all apps:

test = 0
for key in count:
    test += count[key]
print(test)

10840

and your count dictionary is something like this:

for key, value in count.items():
    if 7 <= value:
        print("App (", key,") has been repeated", value, "times")

Results in:
image

so each app becomes a unique key in the dictionary and the value is the number of times it has been repeated. Now when you split this, you are essentially splitting the 9659 unique keys which give you a different number of unique and duplicate apps.

You can include this as an additional analysis/ step.

To actually identify unique from duplicate apps, you only need two lists: if it exists in unique list, the app goes to duplicate and if it’s not present in unique (being encountered for the first time), send it to unique list, just like Eugeniosp3 has described it.

And yes both the lists are empty initially, however, every time a unique app will be encountered, the workflow will move to the part where we add app to unique list. So it won’t be empty for long! and then when the same app is encountered during the process of loop, it will send it to duplicate list.

1 Like
duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
    
print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:15])

I’m confused as to how it sorts out the duplicate and unique apps when there’s nothing present in unique apps yet. According to the code, wouldn’t “app” be absent in both duplicate apps and unique apps when it enters the for loop?

Think of it like the program is picking up a box of papers and each paper has a name on it, the name of the app.
When you write a for loop, the piece of paper with a name on it, in your code would be ‘app’.

So now you are taking one by one from that box and picking up each ‘app’, and putting into two separate containers. One container is labeled unique and one is duplicate.

Okay so we start and pick up the very first paper and it’s the first time you see it so you put it into the unique container. There is nothing in there and you just started so you know it’s not a duplicate by virtue of ignorance as to what lies deeper into your huge box of apps.

You go through it like 10 more times until you pick up 1 you have seen before.

So now you say to yourself ‘hey, I’ve seen this one before.’

So you put it into the duplicate container.

This set of logic you did is what the if statement does. It basically means, in plain English, if this app is already in the container, don’t put it in there, put it in the duplicate container.

This is called iteration and the box is the iterable object. For loops allow this type of iteration to occur.

Okay so now that you’ve separated these out, you can go ahead and count. Luckily you don’t have to do it manually. You can just run len() function on the two lists.

2 Likes

Something I do to make this easier on myself man is write it as I would say it to myself. We’re learning so you shouldn’t expect yourself to be able to do these wild and advanced moves.

I would read your code like this: "For app in anodroid, the app at index 0 is the name, if name is in unique_apps already, put name into duplicates. Otherwise (else), put it in unique, because it was not there before.

I would have written it as :
if name is not in (if not in) unique_apps, put it in there(append)! otherwise(else) put it in(append) duplicate_apps.

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
1 Like

hey @eugeniosp3

this exactly what I have done for my project.

The benefit of the Count dictionary here is, say I want to show which all apps have been repeated quite often, then the dictionary can be sorted and top 10 or bottom 10 results (depending or asc-or-descending) or put a threshold, apps repeated more than 7 times and such, can be shown!

1 Like

Thank you for the compliment :slight_smile:

Your code doesn’t work, it needs some tweaking, but it’s close. Take a look at the printed lists:

>>> body_parts =  ['foot', 'hand', 'finger', 'toe', 'foot']
>>> 
>>> list_of_duplicates = []
>>> list_of_uniques = []
>>> 
>>> for x in body_parts:
...     if x not in list_of_uniques:
...         list_of_uniques.append(x)
...     else:
...         list_of_duplicates.append(x)
... 
>>> print(list_of_duplicates, list_of_uniques, sep="\n")
['foot']
['foot', 'hand', 'finger', 'toe']

Anyway, it seems like you and Rucha got this, thank you for the help!

1 Like