Guided project; analyzing the google store data, review count column has letters

So, I am at the point in the guided project where I have to put the review count into a list to make a dictionary of app name: review count, but the reviews are frequently as “40M” “400K” “3M”.

other than pulling out and changing all of the values with 'if “M” in “reviewcount” ’ (which I don’t think I can do) is there anything I can do with that column?

Hi @westlundderek, welcome to the community!

I believe you’re looking for the review counts in the wrong index of each row. Take a look at the headers from when you used the explore_data function to see where the reviews are located. The next index after it (in the android set) contains the size of the app, which is the only spot I see that has numbers like ‘40M’, so I suspect that might be where the confusion lies.

I hope that helps.

‘App’, ‘Category’, ‘Rating’, ‘Reviews’, ‘Size’,
‘Instagram’, ‘SOCIAL’, ‘4.5’, ‘66577313’, ‘Varies with device’
0 , 1 , 2 , 3 , 4

2 max_reviews = {}
3 for apps in google_body:
----> 4 n_reviews = float(apps[3])
5 name = apps[0]
6 if (name in max_reviews) and (max_reviews[name] < n_reviews):

ValueError: could not convert string to float: ‘3.0M’

if my counting is correct, I am selecting the “reviews” row, but some of those values have been truncated in a very unhelpful manner.
additionally, I looked at the solution guide, and it also calls for column 3.

You got it!

In the 3rd screen of that mission it mentions that there is a row in the android set that has an error. Once that row is deleted, your loop should work as expected.

If you’re having trouble with this part, this post should help: I don't know why str cant be converted to float

okay, you’re right I forgot about that row deletion, but the problem now, is that I have a previous code cell in which I delete that row, and also confirm that that row has been deleted. but it seems it doesn’t get deleted somehow.

explore_data(google_body, 10470, 10475)
print(len(google_data))
del google_data[10472]
print(len(google_data))

[‘Jazz Wi-Fi’, ‘COMMUNICATION’, ‘3.4’, ‘49’, ‘4.0M’, ‘10,000+’, ‘Free’, ‘0’, ‘Everyone’, ‘Communication’, ‘February 10, 2017’, ‘0.1’, ‘2.3 and up’]

[‘Xposed Wi-Fi-Pwd’, ‘PERSONALIZATION’, ‘3.5’, ‘1042’, ‘404k’, ‘100,000+’, ‘Free’, ‘0’, ‘Everyone’, ‘Personalization’, ‘August 5, 2014’, ‘3.0.0’, ‘4.0.3 and up’]

[‘Life Made WI-Fi Touchscreen Photo Frame’, ‘1.9’, ‘19’, ‘3.0M’, ‘1,000+’, ‘Free’, ‘0’, ‘Everyone’, ‘’, ‘February 11, 2018’, ‘1.0.19’, ‘4.0 and up’]

[‘osmino Wi-Fi: free WiFi’, ‘TOOLS’, ‘4.2’, ‘134203’, ‘4.1M’, ‘10,000,000+’, ‘Free’, ‘0’, ‘Everyone’, ‘Tools’, ‘August 7, 2018’, ‘6.06.14’, ‘4.4 and up’]

[‘Sat-Fi Voice’, ‘COMMUNICATION’, ‘3.4’, ‘37’, ‘14M’, ‘1,000+’, ‘Free’, ‘0’, ‘Everyone’, ‘Communication’, ‘November 21, 2014’, ‘2.2.1.5’, ‘2.2 and up’]

10842
10841
[‘Jazz Wi-Fi’, ‘COMMUNICATION’, ‘3.4’, ‘49’, ‘4.0M’, ‘10,000+’, ‘Free’, ‘0’, ‘Everyone’, ‘Communication’, ‘February 10, 2017’, ‘0.1’, ‘2.3 and up’]

[‘Xposed Wi-Fi-Pwd’, ‘PERSONALIZATION’, ‘3.5’, ‘1042’, ‘404k’, ‘100,000+’, ‘Free’, ‘0’, ‘Everyone’, ‘Personalization’, ‘August 5, 2014’, ‘3.0.0’, ‘4.0.3 and up’]

[‘Life Made WI-Fi Touchscreen Photo Frame’, ‘1.9’, ‘19’, ‘3.0M’, ‘1,000+’, ‘Free’, ‘0’, ‘Everyone’, ‘’, ‘February 11, 2018’, ‘1.0.19’, ‘4.0 and up’]

[‘osmino Wi-Fi: free WiFi’, ‘TOOLS’, ‘4.2’, ‘134203’, ‘4.1M’, ‘10,000,000+’, ‘Free’, ‘0’, ‘Everyone’, ‘Tools’, ‘August 7, 2018’, ‘6.06.14’, ‘4.4 and up’]

[‘Sat-Fi Voice’, ‘COMMUNICATION’, ‘3.4’, ‘37’, ‘14M’, ‘1,000+’, ‘Free’, ‘0’, ‘Everyone’, ‘Communication’, ‘November 21, 2014’, ‘2.2.1.5’, ‘2.2 and up’]

so, I should have deleted life made wifi, but it’s not deleted.

okay, I wasn’t consistently working on cleaning google_body. fixed that, but now i have the fun part of;

max_reviews = {}
for apps in google_body:
n_reviews = float(apps[3])
name = apps[0]
if (name in max_reviews) and (max_reviews[name] < n_reviews):
reviews_max[name] = n_reviews
elif (name not in name):
reviews_max[name] = n_reviews
print(len(max_reviews))
printing out 0.

For any mishaps, you can always edit the code and restart the kernel – the original data is still intact.

For the max_reviews dictionary, you’re using reviews_max in your if statement so it’s not able to populate the dictionary. :wink:

if (name in max_reviews) and (max_reviews[name] < n_reviews):
    reviews_max[name] = n_reviews
elif (name not in name):
    reviews_max[name] = n_reviews

thanks. the easy mistakes. always the easy mistakes.
edit: name not in name. i see why some devs get stress disorders.

That’s why they have these: https://en.wikipedia.org/wiki/Rubber_duck_debugging :rofl:

Good luck with the rest of your project!

Thanks a lot for all the help.

1 Like