First Guided Project - Playstore free apps difference

Hello everyone!

I’m working on the first guided project and usually used the solution notebook to make sure that I was having the “right answers” with my own codes.
I noticed a small discrepancy in the playstore dataset between my code and the solutionbook

In the step where you filter in free apps, I suspect most students use a if statement to check whether price == 0 and append to a new list, which is exactly what I did. However in the Playstore dataset, a few entries have a $ sign which prevents from using a int/float conditional statement; you have to use a string as in the solutionbook.
Since I didn’t think of using a string, I used the “free or paid” column (column 7, index 6 in the dataset) . In that case, your list of free Playstore app actually returns a length of 8863 instead of the 8864 expected in the solution.
This suggests that somewhere an app that has a price = ‘0’ is actually not ‘free’.

Anyone else noticed this? I know this is more of a dataset issue but I can’t see any mention of it on Kaggle
Thanks!

2 Likes

Hi @Zacross,

Welcome to the community. That sounds like a good find. Maybe we can look further into it and find a possible discrepancy.

Can you find out the row where price==0, column_7_value!=‘free’?

1 Like

Hello there!

Thanks for your reply
So it seems that the app in question is Command & Conquer: Rivals, which is the index 9149 / row 9150 in the original PlayStore dataset. As seen below, the price is zero but it’s missing data in the column 7

[‘Command & Conquer: Rivals’, ‘FAMILY’, ‘NaN’, ‘0’, ‘Varies with device’, ‘0’, ‘NaN’, ‘0’, ‘Everyone 10+’, ‘Strategy’, ‘June 28, 2018’, ‘Varies with device’, ‘Varies with device’]

Is it something that could be told to Dataquest team to include it as part of the data cleaning step?

Cheers and merry christmas

2 Likes

Glad you did find an interesting data. Maybe, DQ can add this. Let’s check with @Sahil He is the right person to answer this.

1 Like

Hi @Zacross,

Thank you for bringing this to our attention. Yes, that missing value can certainly cause trouble if we are using the Type column. Perhaps, we can add an instruction to delete this row in screen 3. I will get it logged for review.