Python For Data Scientists - Fundamentals: ios Data different?

Dear aspiring data scientists :slight_smile:

I am working through the very first course in my Data Analyst with Python path.

In the guided project (Profitable App profiles for the app store and Google Play markets) the ios data seems to be fundamentally different.

In step 12 (Most Popular Apps by Genre on the App Store) I get a completely different result than in the solution notebook.

I get average ratings of <10 whereas the solution notebook has 1,000s in the avg ratings calculation results.

So far however, all other calculations have worked out fine, so I assume, the baseline data I did in the previous steps is correct.

My results:
image

Any ideas?

Hey, Phil.

Can you please share your notebook so that people can dig into this? Thanks!

3 Likes

Hi @Bruno

sorry, of course…

Please see below…
KS App Analysis.ipynb (412.6 KB)

The problem is in the line for n_ratings:
n_ratings = float(app[5])

app[5] is referencing the price column of ios_english. You will want to change this to app[6] to get the ratings.

I think the dataset that’s downloadable from Kaggle is slightly different than the one used within the guided project on Dataquest and in the solution notebook. There seems to be an extra column at the beginning with index numbers. You can spot it when you go to the beginning of your code where you explored the first few lines of the ios dataset and compare it to what is in the solution notebook.

1 Like

April is probably right. I am unable to reproduce your values using your notebook, Phil.

I’m potentially using different files, though. Can you also provide the datasets you’re using, if you still can’t figure it out? I know you mention where to get them in your notebook, but Kaggle wants my personal information, I can’t be bothered with that :slight_smile: