What I expected to happen: Number of comments from column index 4 will be converted from string to interger.
What actually happened: ```ValueErrorTraceback (most recent call last)
in ()
2
3 for post in ask_posts:
----> 4 total_ask_comments += int(post[4])
5
6 avg_ask_comments = total_ask_comments / len(ask_posts)
ValueError: invalid literal for int() with base 10: ‘H’```
Other details: Am I correct to understand that Python flagged that column index 4 cannot be converted from string to interger because some strings contain letter ‘H’? How do I go about rectifying this? The guided project doesn’t mention anything about data in that column needs cleaning. I compared my codes so far against the solution and they are the same. Anyone encounters similar problem? Thanks in advance.
Btw, how do I attach a picture so people can see my code better without having to go to my screen?
Hi @thaihoan_nguyen. You are correct that you’re getting the error because the string contains an 'H' character that can’t be converted to int.
Another student had the same error and it turned out to be a problem with how the ask_posts list was generated in a previous cell. Have a look at this post and see if it helps.
If not, it would be helpful to see your project notebook so others in the community can help with troubleshooting. (When I click the link to the mission, it takes me to my own code, not yours.) Here is a tutorial on how to share your project.
Also, @april.g, when you said ‘exploring the list’, what steps do you mean by that? I’m guessing you don’t just print out the list and have a look through as it’s rather lengthy…
Right, printing out the whole list wouldn’t be too helpful! You can just check the first 5 entries with print(your_list[:5]) to get a feel for what the list looks like. You will do this a lot as you’re working to either make sure your code worked as expected, or to figure out what went wrong.
Hello, I am still stuck when I’m trying to print len(ask posts), len(shown posts) and len(other posts). My output is 0, 0 and 4 respectively which is quite different from the solution provided. My code looks similar to the code in the solution. What could I be doing wrong?
Hi @jamoko77.poo. There’s probably something else going on from earlier in the code that’s causing this issue. I have a couple ideas to start with:
If you’ve left and come back to a project, make sure you’ve rerun all the cells since the kernel was restarted.
Check to make sure you didn’t accidentally overwrite the hn dataframe at the beginning when you’re separating the headers from the rest of the data.
If you’re still stuck, it might be easiest if you either put in the code you have so far, or attach a copy of your .ipynb file so that we can have a look at the notebook to help diagnose.
If the lowercase version of title starts with ask hn , append the row to ask_posts .
Else if the lowercase version of title starts with show hn , append the row to show_posts .
Solution
if title.lower().startswith("ask hn"):
ask_posts.append(post)
elif title.lower().startswith("show hn"):
show_posts.append(post)
Here, we are converting them to lower case using title.lower() because some of the titles don’t start with "Ask HN". To be specific, 2 of them start with "ASK HN".
I have similar problem. I think our .csv file is different. I have inspected the first column, there is no ‘ask hn’ / ‘Ask HN’ and ‘show hn’ / ‘Show HN’ at all.
ask_hn_posts = []
for post in hn:
title = post[1]
if title.lower().startswith("ask hn"):
ask_hn_posts.append(post)
print(ask_hn_posts[:5])
Output:
[
['12296411', 'Ask HN: How to improve my personal website?', '', '2', '6', 'ahmedbaracat', '8/16/2016 9:55'],
['10610020', 'Ask HN: Am I the only one outraged by Twitter shutting down share counts?', '', '28', '29', 'tkfx', '11/22/2015 13:43'],
['11610310', 'Ask HN: Aby recent changes to CSS that broke mobile?', '', '1', '1', 'polskibus', '5/2/2016 10:14'],
['12210105', 'Ask HN: Looking for Employee #3 How do I do it?', '', '1', '3', 'sph130', '8/2/2016 14:20'],
['10394168', 'Ask HN: Someone offered to buy my browser extension from me. What now?', '', '28', '17', 'roykolak', '10/15/2015 16:38']
]
I’m having issues calculating the counts_by_hour and comments_by_hour. I’m not sure what I’m doing wrong. I haven’t completed the markdown part of the project, but instead working my way through the code. Any advice would be helpful.