What I expected to happen: Calculate number of comments in average.
What actually happened: ```ValueErrorTraceback (most recent call last)
in ()
2
3 for post in ask_posts:
----> 4 total_ask_comments += int(post[4])
5
6 avg_ask_comments = total_ask_comments / len(ask_posts)
ValueError: invalid literal for int() with base 10: ‘h’```
Other details: My understanding is the post[4] string that can’t be converted to integer because some data contain letter ‘h’, am I correct? If so, how do I go about rectifying this? My code is the same to the one in the solution…
I’m new to dataquest and community forum, I’m barely navigating this space… I included the screen link, is that what you meant by “sharing notebook” or it is something else and how can I do it?
You can solve this invalid literal for int() with base 10 by using Python isdigit() method to check whether the value is number or not. The returns True if all the characters are digits, otherwise False.
num = "55.55"
if num.isdigit():
print(int(num))
else:
print("String value is not a digit : " , num)
the error might have occurred from the second line, Observe this
total_ask_comments = 0
for row in ask_posts:
num_comments = row[4]
num_comments = int(num_comments)
total_ask_comments += num_comments
avg_show_comments = total_ask_comments/len(ask_posts[4])
print(avg_show_comments)
or the error might have occurred in creating the empty list. Most users from the observation likely appended the title instead of the whole row: see the below
# empty list creation
ask_posts = []
show_posts = []
other_posts = []
# looping over hn
for row in hn:
title = row[1]
if 'ask hn' in title.lower():
ask_posts.append(row)
elif 'show hn' in title.lower():
show_posts.append(row)
else:
other_posts.append(row)
# number of posts in each category
print(len(hn))
print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))
Hi, I think it is the logic problem. In the last step, we out title start with lowercase ‘show hn’ or ‘ask hn’ to the ask_posts list and show posts list only. We did not put the whole row to the list… if you try to print(ask_posts) you can only see titles only… That is why we cant change to number at this step…is it correct?
I had the same problem and I figured out that my mistake was the same; I appended only the title. But I solved the issue before coming here and reading this, by comparing the titles appended in the ask_lists with the title of the list of list using nested loop. (But now I have changed it by appending the whole row, instead of the titles. I believe this is a better piece of code.)
But when I found the total comment and later the average, there are slight changes.
Total comment I’m getting by comparing the titles is 24499 and average as 14.0475
When I find the total comment just by adding the 4th index at ask_posts it is coming at 24483 and average is 14. 0384
I know that these are probably negligible difference in such a big data set, but I am just wondering what is contributing to this extra value. Or what am I doing wrong in the first method?