Guided Project 1: ios Data on Kaggle different from the set used by Solution

While working on the 1st guided project of the Data Analyst path, it has occurred to me that the ios data set that instructions linked us to download on Kaggle is definitely different from the one used by solution. From the start I’ve noticed differences like different first column, different output app names (part 1 of removing non-English apps), and different output of number of English names (part 2 of removing non-English apps, which is where I’m at the moment).

Part 1:

Part 2:

Even copying the solution word by word doesn’t give me the same output as the solution, which kind of worries me. I think I will just proceed for now, but this is weird, don’t know if anyone else has the same problem, and please update the solution if indeed the data sets are different, Dataquest, thank you!

Try using this two files AppleStore.csv (708.8 KB) googleplaystore.csv (1.3 MB)

1 Like

Welcome to the forums, @cipiy135!

One thing you can do to make sure your datasets are always the same is by clicking the ‘Jupyter’ icon on the top left, and then accessing (and downloading) the CSV files straight from there:

Image from Gyazo

1 Like

Hi @cipiy135, the data sets are identical, it’s just that the file we have on our site is ordered by the column rating_count_tot in a descending order (this is true for all the missions in the course), meaning that the most popular apps (like Facebook, Instagram, etc.) appear first.

1 Like

I’m scratching my head and I’m not sure if anyone else is having same issues.

But I have gotten to through Part 2 of the non-english apps removal process and I have no clue why my explore_data() function, at this step isn’t returning the same data.

Code looks all the same and I reuploaded the CSV files again. Mmm, did anyone else come across this output, as well?

Can a team member send me both excel files and an exported list of the correct data after part two of the removing of non-english characters so I can find the cause?

Hi All,

Similarly, I don’t get the same result as the solution provided.
This is my result:

and this is the solution:

I keep continuing the steps then in the 12th step I find out that the data set is not clear. There are some non-English apps presented.

Hi @rpmuayyad. It looks like something may have gone wrong with the step for filtering out the non-English apps. You’ll want to compare and contrast your code for the function you wrote for that step. There are some posts on here where other students had trouble with this part you can check out also. If you’re not able to figure it out, the community might be able to help if you copy your code so someone can have a look.