Guided Project: Profitable Apps... Problem with deleting row[10472] in dataset

Screen Link

My Project.ipynb (7.3 KB)

My Code:

reviews_max = {}

for app in google_data:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

What I expected to happen:

The del function would delete or remove the row that has an inconsistency with it’s ratings (row[10472])

What actually happened:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_17864\2343688775.py in <module>
      3 for app in google_data:
      4     name = app[0]
----> 5     n_reviews = float(app[3])
      6 
      7     if name in reviews_max and reviews_max[name] < n_reviews:

ValueError: could not convert string to float: '3.0M'

Ok that’s just the bare bones. The real problem is that every time I delete the row and the restart the kernel and rerun all the cells the row in question comes back. So to reiterate, if I use del google_date[10472] and then confirm that I deleted the correct row with print(google_data[10472]) and len(google_data) they both confirm that I did in fact remove the correct row and that there are 10,841 total rows instead of the original 10,842.

I’ve searched many related threads and the only thing I could find that seemed of a similar nature to my problem was in this thread at the bottom where user april.g suggests something about dataframes. Although I’m pretty sure I haven’t learned that yet, so I naturally I’ve come here.

I’m obviously really new at this so any help is greatly appreciate!

(Sorry for the weird variable names!)

Click here to view the jupyter notebook file in a new tab

1 Like

Hi @Evo,

Just to clarify, do you comment out the del google_date[10472] once you’ve confirmed that a row was deleted?

The del should be a permanent part of your code like so:

reviews_max = {}

del google_data[10472]

for app in google_data:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

That row in question will always come back after every restart because del only deletes the data stored in Python but it won’t delete the row in the actual .csv file.

Every time you restart the kernel, the whole csv file will be read and stored as a variable; all initial data will be the same every time including the incorrect row. Because of that, you need to delete that incorrect row every time you restart as well.

2 Likes

Ah, I see. I do comment out the del function after I use it. From now on I’ll leave it in there.

Thank you for the solution, and furthermore and explanation of how the del function works!

2 Likes

No worries @Evo.

Good luck with the project.

1 Like