Guided Project: Average rating calculations return as "nan"

Screen Link:
https://app.dataquest.io/m/350/guided-project%3A-profitable-app-profiles-for-the-app-store-and-google-play-markets/9/most-common-apps-by-genre-part-one

My Code:

print('Average rating per category:')
print('\n')

droid_categ_freq = check_percent(free_uniq_Eng_androids[1:],1)
for category in droid_categ_freq:
    total_apps = 0
    total_rating = 0
    for row in free_uniq_Eng_androids[1:]:
        ctg = row[1]
        if ctg == category:
            total_apps += 1
            rating = float(row[2])
            total_rating += rating
    avg_category = total_rating/total_apps
    print(category,":",avg_category)

What I expected to happen:

Hi, after cleaning the data I finally got to analyze it. However, as I was trying to find the average rating per category for Android apps, my results showed “nan” for each of them, except for the “ENTERTAINMENT” category. My code is quite identical to the solutions, so I don’t really understand why this happened. I would appreciate it if you guys can explain it to me.

What actually happened:

Average rating per category:


ART_AND_DESIGN : nan
AUTO_AND_VEHICLES : nan
BEAUTY : nan
BOOKS_AND_REFERENCE : nan
BUSINESS : nan
COMICS : nan
COMMUNICATION : nan
DATING : nan
EDUCATION : nan
ENTERTAINMENT : 4.118823529411763
EVENTS : nan
FINANCE : nan
FOOD_AND_DRINK : nan
HEALTH_AND_FITNESS : nan
HOUSE_AND_HOME : nan
LIBRARIES_AND_DEMO : nan
LIFESTYLE : nan
GAME : nan
FAMILY : nan
MEDICAL : nan
SOCIAL : nan
SHOPPING : nan
PHOTOGRAPHY : nan
SPORTS : nan
TRAVEL_AND_LOCAL : nan
TOOLS : nan
PERSONALIZATION : nan
PRODUCTIVITY : nan
PARENTING : nan
WEATHER : nan
VIDEO_PLAYERS : nan
NEWS_AND_MAGAZINES : nan
MAPS_AND_NAVIGATION : nan

@hongthien798: I’ve not attempted this fully myself. If I’m not wrong there are + symbols (for the Installs column)so some extra steps need to be taken to do some further data cleaning. Trying to add rating of type string to a tabulator (i.e. total_apps of type int cannot be added to string) will not work.

In addition, here we are looking at the number of installs as a factor for assessing the app instead of the rating to avoid the NaN issue for some categories (removing the NaN values might be infeasible, depending on the dataset - as the category of “Beauty” has multiple NaNs). So it’s also good to open the dataset in excel if possible to do some EDA before trying to do your own exploration.

Hope this helps!

1 Like

Thank you so much! I truly did not read through all of the solutions for that part before trying on my own - I stopped at the IOS part before getting to the Android part. I also didn’t notice that NaN values are actually clean-able issues in a dataset, so this has been very helpful.

Again, thanks a bunch!

1 Like

Its fine @hongthien798. Just remember EDA is very important for inspecting your data and cleaning any unnecessary or blank columns (sometimes weeding out obvious outliers)! As the saying goes “Garbage in, garbage out”, so be sure to check your data in future projects before proceeding ya…

Happy Coding!

1 Like