Jupyter Guided Project : 5 Removing Duplicate Entries: Part Two - Solution Wrong In Solution Book

Screen Link:

Read My code then scroll down to read the solution book code (pasted as screenshot) and tell me if i was different anywhere.
Then look at the error message i get from my code (code same as solution) and tell me if u did not feel like u got pranked, because I surely did.
Is there some reading between the lines here on DQ that I have been missing since 2018, that I have been falling steep for this non-trick every time?

My Code:

reviews_max = {}
duplicate_apps = {}
for app in google_store[1:]:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max:
        if reviews_max[name] < n_reviews:
            reviews_max[name] = n_reviews
    elif  name not in reviews_max:
        reviews_max[name] = n_reviews


What I expected to happen:
I expect the state of dataset/csv provided to us at this point to be devoid of dirty data or the assignment error checker should not check for strings vs int type issue so that i can pass the assignment, else there should be a clear notice to clean the file before doing this Part 2 of project and that the range of bad characters to be provided (there is no way i can dream abt it before hand nor can i risk of changing anything in dataset before being instructed for because you never know if the next part of project might require having the same changes being NOT DONE)

What actually happened:
ended up putting a precious half an hour towards no use, EXACTLY in the manner I described in my previous post/topic abt another assessment/lesson in python where the “language was completely from mars but i was still on earth”. The question/task described in the instructions pane was clearly mis-leading trying to make me think that i am doing something wrong (i clearly wasnt because now my code and solution book literally have same code minus one line). I did not get or obtain anything out of this by putting half an hour into something which led me no-where. I promise i haven’t messed with the actual data provided us ever before or now or today.
Nor do previous parts of the guided project ask me or mention about cleaning other columns (or for that matter entire dataset, in which case there is no point of doing this project because the project itself is literally about cleaning and analyzing the dataset)
The Error I Get Is As Follows :-
(this shows the dataset available was not fit for direct implementation of this “PART TWO” part of the project.)

ValueError                                Traceback (most recent call last)
<ipython-input-78-cff677cf5178> in <module>
     15 for app in google_store:
     16     name = app[0]
---> 17     n_reviews = float(app[3])
     19     if name in reviews_max and reviews_max[name] < n_reviews:

ValueError: could not convert string to float: '3.0M'

The Solution Given In The Solution Book Is EXACTLY LIKE MY CODE. And They Show The Same Articulation.
If That Is So, Then How Am I Getting Error ?

screenshot of solution book : -

Has anyone else posted this question before? I would have researched for this question if I wouldn’t have had to spend my valuable half hour behind wrong thinking. But now that i lost my time and cant afford to put more towards searching for it thoroughly, I need help if anyone was in same situation before and if yes, what keywords did they use in search-box to get to this topic?
I tried searching for this for about 5 mins but I failed because the search-results, result in plethora of ‘results’ displayed in slightly odd (titled) manner making it difficult for me to guess which one might be talking about my situation. I do not have several hours to click on each question and read them like a course-book.
when I click on community discussion, I land in a GENERAL PAGE and not the page specific to this PART TWO or Chapter 5 of Guided Project. (which kind of defeats the purpose of the community discussion button feature as well)

Your code runs fine when I run it.

That means the error happens because of something related to google_store because that’s the only differentiating factor.

The error states that -

The above means that instead of a review number your data has a "3.0M" string.

The next step would be to find out which row in your data has the "3.0M" string instead of a numerical value. That wouldn’t be difficult to find out.

However, there would not have been a need to because the third step of the project asks us to delete a specific row for some reason. Going through that step would have resulted in deleting the row where that string would be present instead of a numerical value for that column.

You seem to have either skipped that step, or you might have deleted a different row instead.

Almost every question in the community gets tagged with the lesson and screen number (manually by the asker, or automatically as long as the asker includes the link to that lesson).

The Get Help button in the classroom at the bottom of the screens would show the option - Community Discussion. Clicking on it would take you to a page with all questions from that lesson that people have posted. You could either check out all of them one-by-one, or focus on the ones tagged with the lesson and screen number (in this case, for example, 350-5).

Or you could have just directly searched for theh error itself -

and it would have shown you several questions that could have potentially helped you out.

It used to point to that specific step in the past, but its functionality was changed. Probably because not all questions were being tagged properly with the specific step (or even the lesson). Now, it does show a bunch of questions, but it’s fairly easy to identify the questions by looking at the tags or the title even. Or you could use the advanced search options on the right-hand side to narrow it down.

If that’s not to your preference feel free to use the Contact Us button in the top-right corner to provide feedback to them about it.

Appreciating your response to my query. However i havent removed any row yet except for the specific index numbered row which was indicated on the community discussion thread linked to the previous part of the guided project assessment. i havent run my code twice as i was very strictly following the instruction which specified twice (to not run delete twice as u may delete another row not planned for). So i deleted row once and kept my hands away from del command.

Ok now how do i reset the state of Dataset. Or refresh the kernel, because this issue has been happening even after me resetting the kernel and refreshing the session like 3 to 4 times.

thanks for the detailed response

Hi doctor,

I got the same error massage like @neo.vandamme1 , can you please find out why this happen again ? How to move forward? I don’t want to spend too much time to debug here.


My response above already points out why this happens -

If the above doesn’t help you then I would suggest searching for similar questions on how to solve the problem. There are some questions here in the community that already discuss the same error.

If none of those helped either, then I recommend creating a separate question, providing all necessary details (including your code and what fixes you have tried so far) and someone can then help you out accordingly.


Just writing to say thank you for your solution. I was having the same problem and it was driving me crazy.

In my case, what happened is that I was misinterpreting the output from the exploredata() function:

First participation

So I assumed the index of the faulty row was 10473 instead of 10472 and deleted the wrong one.
Later, when the “3M not integer error” showed up I just assumed I was in the wrong column and switched to column 2 instead which “worked” but obviously lead to bigger problems down the road…

I am happy that I could traceback the problem to its origin thanks to you!