Loop cleans additional row when run a second time, first guided project

Hello everyone. I am working on the first guided project and I’m getting a unexpected result. Basically, I’m running a loop to clear some data using .remove(row). When I run the loop the first time I get one result that I expect to be final, but it changes if I run the loop more than once. After a few times (this is consistent) it doesn’t change any more. Thanks for your help.

My Code:

import csv
from csv import reader
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android_data = list(read_file)

android_data = android_data[1:]

print(len(android_data))

special_character_android = []

for row in android_data:
    c_count = 0
    app_name = row[0]
    for character in app_name:
        if ord(character) > 127:
            c_count += 1
    if c_count > 2:
        special_character_android.append(row)
        android_data.remove(row)


print(len(android_data))
print(len(special_character_android))

if len(special_character_android) < 10:
    print(special_character_android)

The output from the first block is 10841. This stays consistent.

If you run the second block one time you get 10780 and 61.

If you run it a second time you get 10779
1 and
[['あなカレ【BL】無料ゲーム', 'FAMILY', '4.7', '6073', '8.5M', '100,000+', 'Free', '0', 'Mature 17+', 'Simulation', 'February 25, 2018', '4.2.2', '2.3 and up']]

Why am I not catching this row the first time? I’m having a similar problem in the attached notebook (it takes multiple loops to stop changing), except there I’m also running a loop to catch

for row in android_data:
    if row[6] != 'Free':
        android_data.remove(row)

and it runs multiple times before the data set stops changing. There may be entirely different reasons but it seems like the same behavior.

App Downloads Project.ipynb (25.1 KB)

googleplaystore.csv (1.3 MB)

AppleStore.csv (708.8 KB)

Click here to view the jupyter notebook file in a new tab

1 Like

You are removing rows from android_data while you iterate over it. So, at each iteration of your for loop, your android_data changes. So the more number of times you run that loop, the more it will change. Which then impacts your special_character_android list as well.

For example, if you have a simple list a = [1, 2, 3, 4] and you loop over that list where you print each number, but you also remove a number from the list at each iteration

a = [1, 2, 3, 4]

for item in a:
   print(item)
   a.remove(item)
   
print(a)

What do you think will be the output of the above? Once you think of the output, run the code and see if it matches with what you thought or not.

Something similar happens in your code when you remove rows from android_data as you iterate through them.

1 Like

I ran this in Pythontutor and see how i lose the value/index relationship. And tuples are immutable, so I suppose I have to build a new list or maybe use list comprehension or something.

Thanks!

1 Like
a = [1, 2, 4, 6]

for item in a:
    if item % 2 == 0:
        a.remove(item)

and you keep 4 unintentionally. Thanks!!!

1 Like