Python fundamentals guided Project

I am finding it hard converting the review column in the guided to a floating pointing number.

Screen Link: https://app.dataquest.io/m/350/guided-project%3A-profitable-app-profiles-for-the-app-store-and-google-play-markets/5/removing-duplicate-entries-part-two

for app in google_data[1:]:
   name = app[0]
   n_reviews = float(app[3]) 

I don’t know what is happening, python is not able to convert my review to a floating point number. I need help.

Below is the error code:

ValueErrorTraceback (most recent call last)
in ()
4 for app in google_data[1:]:
5 name = app[0]
----> 6 n_reviews = float(app[3])
7
8

ValueError: could not convert string to float: ‘3.0M’

Hi @princefame1
you are getting this error because you can’t covert the string '3.0M' to float, form me to resolve the problem I removed the row containing this review value when I was cleaning the data because there is missing data in this row

print(google_dataset[10473])

['Life Made WI-Fi Touchscreen Photo Frame',
 '1.9',
 '19',
 '3.0M',
 '1,000+',
 'Free',
 '0',
 'Everyone',
 '',
 'February 11, 2018',
 '1.0.19',
 '4.0 and up']
2 Likes

Hi @princefame1

Welcome to our Dataquest Community.

You are getting this error because you missed the instructions of slide 3 of Guided Project: Profitable App Profiles for the App Store and Google Play Markets.

@bahmed21 is correct about the instructions in slide 3 Deleting Wrong Data.

:slightly_smiling_face: .

4 Likes

Thanks, it has been solved. And I really appreciate.

1 Like

Hi,
I’ coming across the same issue. I did delete the row data as instructed in slide 3. My code also matches the solution code posted on github. However, the same error still appears.

Hi @ccontan

Do one thing, restart the kernel and run all cells then check.

I’ve refreshed my screen, restarted and re-run the kernel but i’m still coming across the same issue. I’ve re-ran the code in a fresh python notebook to make sure that it it’s not some error in the earlier lines of code but still the same error.

Hey @ccontan

Can you upload the notebook?

Profiting on Google Play & App Store - Free Apps .ipynb (8.5 KB)

Attached.

Hey @ccontan

Just give a look on your deleting row that you are deleting the correct row.

I think you will find your mistake.

:slightly_smiling_face: .

Got it. thanks. :expressionless:

1 Like

Hi,

I am trying to delete the row that has bad data aka row number 10472. However, even after i drop that row, i still see the same record. Am I missing something here ?

How do i delete this row from the data set ? Thanks for your help

Hi @vkalyanraman. Try either data = data.drop(data.index[10472]) or data.drop(data.index[10472], inplace=True) so that the results are saved back to the dataframe.

1 Like

Thanks April! It worked.

I still have a question.

Why do you have to assign it back the reference to the dataframe. I thought when you call a method on the object, you are actually changing the object. I don’t understand why the reference need to be updated. Is this how it works in python ? Say, you delete an element in a list and you still have to assign it to a reference for it to take effect ?

Thanks for the help and clarification!

Kalyan

DataFrame.drop() by default (inplace=False) returns a new copy with the operation performed. It doesn’t change the object unless we change the inplace parameter, or we save the copy to the same variable. I found this article to be an interesting read that goes more indepth:

This thread on Stack Overflow had some discussion about whether or not to use the inplace parameter in case you were interested in that as well:

2 Likes

Thanks a lot April. The article is very helpful.

Also, the reference material was suggesting to use the “Del data[10472]” and it was giving an error in my code. I tried doing Del data[10472:10473] and it was not working as well. Is this not working because of the same reason ?

Thanks,
Kalyan

I think it has to do with the nature of the pandas library? In the original project we read the dataset as a list of lists (not dataframe objects), and del works just fine on lists and dictionaries (here’s some examples). del can be used to delete column in a dataframe, but I haven’t seen that for rows. I found this thread on Stack Overflow that explained it this way:

You can’t remove a row with del as rows returned by .loc or .iloc are copies of the DataFrame, so deleting them would have no effect to your actual data.

1 Like

I know this has been discussed but can I just keep in a line of del so that every time the code is executed it gets deleted but is present in original file.?

Hi @Prem, I am getting the same error as above. I have deleted the android data set row ‘10473’ which contains the value ‘3M’ under the ‘Reviews’ column. And after that, added the code to create a new dictionary as given in the solution. Could you please let me know why I am running into this problem? I have attached the notebook below. Thanks, much appreciated! Data_Analysis_Project.ipynb (11.1 KB)

Click here to view the jupyter notebook file in a new tab

Did you check the notebook after restarting and running all the cells?