I’m working on the first guided project and usually used the solution notebook to make sure that I was having the “right answers” with my own codes.
I noticed a small discrepancy in the playstore dataset between my code and the solutionbook
In the step where you filter in free apps, I suspect most students use a if statement to check whether price == 0 and append to a new list, which is exactly what I did. However in the Playstore dataset, a few entries have a $ sign which prevents from using a int/float conditional statement; you have to use a string as in the solutionbook.
Since I didn’t think of using a string, I used the “free or paid” column (column 7, index 6 in the dataset) . In that case, your list of free Playstore app actually returns a length of 8863 instead of the 8864 expected in the solution.
This suggests that somewhere an app that has a price = ‘0’ is actually not ‘free’.
Anyone else noticed this? I know this is more of a dataset issue but I can’t see any mention of it on Kaggle
Thanks!
Thanks for your reply
So it seems that the app in question is Command & Conquer: Rivals, which is the index 9149 / row 9150 in the original PlayStore dataset. As seen below, the price is zero but it’s missing data in the column 7
[‘Command & Conquer: Rivals’, ‘FAMILY’, ‘NaN’, ‘0’, ‘Varies with device’, ‘0’, ‘NaN’, ‘0’, ‘Everyone 10+’, ‘Strategy’, ‘June 28, 2018’, ‘Varies with device’, ‘Varies with device’]
Is it something that could be told to Dataquest team to include it as part of the data cleaning step?
Thank you for bringing this to our attention. Yes, that missing value can certainly cause trouble if we are using the Type column. Perhaps, we can add an instruction to delete this row in screen 3. I will get it logged for review.