Guided Project: Exploring Hacker News Posts, Step 4: I'm having problem trying to convert number of comments from string to integer using int()

Screen Link: https://app.dataquest.io/m/356/guided-project%3A-exploring-hacker-news-posts/4/calculating-the-average-number-of-comments-for-ask-hn-and-show-hn-posts

Your Code: ```total_ask_comments = 0

for post in ask_posts:
total_ask_comments += int(post[4])

avg_ask_comments = total_ask_comments / len(ask_posts)
print(avg_ask_comments)```

What I expected to happen: Calculate number of comments in average.

What actually happened: ```ValueErrorTraceback (most recent call last)
in ()
2
3 for post in ask_posts:
----> 4 total_ask_comments += int(post[4])
5
6 avg_ask_comments = total_ask_comments / len(ask_posts)

ValueError: invalid literal for int() with base 10: ‘h’```

Other details: My understanding is the post[4] string that can’t be converted to integer because some data contain letter ‘h’, am I correct? If so, how do I go about rectifying this? My code is the same to the one in the solution…

1 Like

Hey, Thaihoan. Your diagonostic is on point!

To better help see what’s going on here, please share your notebook.

Hi Bruno,

I’m new to dataquest and community forum, I’m barely navigating this space… I included the screen link, is that what you meant by “sharing notebook” or it is something else and how can I do it?

Thanks.

Notebook is the file created by Jupyter (the one with the cells and the output).

If you’re working in the app, you can download it. If you’re working locally, you’ll need to locate it. Notebook’s names typically end with .ipynb.

1 Like

I managed to spot my mistake and fixed it. Thanks @Bruno for the notebook advice! I might ask for your help soon again :slightly_smiling_face:

Hello Nguyen, I am having similar problem. How did you sort it out.

Find link to my notebook here

http://localhost:8888/notebooks/documents/dataquest/hacker%20news%20guided%20project%20issue.ipynb#

Peter

A post was split to a new topic: Mismatch with Hacker News GP Solution

A post was split to a new topic: ValueError: invalid literal for int() with base 10: ‘h’

Hi @jamoko77.poo, here is the solution: this link

1 Like

You can solve this invalid literal for int() with base 10 by using Python isdigit() method to check whether the value is number or not. The returns True if all the characters are digits, otherwise False.

num = "55.55"
if num.isdigit():
  print(int(num))
else:
  print("String value is not a digit : " , num)

the error might have occurred from the second line, Observe this

total_ask_comments = 0
for row in ask_posts:
    num_comments = row[4]
    num_comments = int(num_comments)
    total_ask_comments += num_comments
avg_show_comments = total_ask_comments/len(ask_posts[4])
print(avg_show_comments)

or the error might have occurred in creating the empty list. Most users from the observation likely appended the title instead of the whole row: see the below


# empty list creation
ask_posts = []
show_posts = []
other_posts = []
# looping over hn
for row in hn:
    title = row[1]
    
    if 'ask hn' in title.lower():
        ask_posts.append(row)
    elif 'show hn' in title.lower():
        show_posts.append(row)
    else:
        other_posts.append(row)

# number of posts in each category
print(len(hn))
print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))
1 Like

Hi, I think it is the logic problem. In the last step, we out title start with lowercase ‘show hn’ or ‘ask hn’ to the ask_posts list and show posts list only. We did not put the whole row to the list… if you try to print(ask_posts) you can only see titles only… That is why we cant change to number at this step…is it correct?

I had the same problem and I figured out that my mistake was the same; I appended only the title. But I solved the issue before coming here and reading this, by comparing the titles appended in the ask_lists with the title of the list of list using nested loop. (But now I have changed it by appending the whole row, instead of the titles. I believe this is a better piece of code.)

But when I found the total comment and later the average, there are slight changes.
Total comment I’m getting by comparing the titles is 24499 and average as 14.0475

When I find the total comment just by adding the 4th index at ask_posts it is coming at 24483 and average is 14. 0384

I know that these are probably negligible difference in such a big data set, but I am just wondering what is contributing to this extra value. Or what am I doing wrong in the first method?