Guided Project: Exploring Hackers News Post

Screen Link: https://app.dataquest.io/m/356/guided-project%3A-exploring-hacker-news-posts/3/extracting-ask-hn-and-show-hn-posts

Your Code: ```total_ask_comments = 0

for post in ask_posts:
total_ask_comments += int(post[4])

avg_ask_comments = total_ask_comments / len(ask_posts)
print(avg_ask_comments)```

What I expected to happen: Number of comments from column index 4 will be converted from string to interger.

What actually happened: ```ValueErrorTraceback (most recent call last)
in ()
2
3 for post in ask_posts:
----> 4 total_ask_comments += int(post[4])
5
6 avg_ask_comments = total_ask_comments / len(ask_posts)

ValueError: invalid literal for int() with base 10: ‘H’```

Other details: Am I correct to understand that Python flagged that column index 4 cannot be converted from string to interger because some strings contain letter ‘H’? How do I go about rectifying this? The guided project doesn’t mention anything about data in that column needs cleaning. I compared my codes so far against the solution and they are the same. Anyone encounters similar problem? Thanks in advance.
Btw, how do I attach a picture so people can see my code better without having to go to my screen?

Hi @thaihoan_nguyen. You are correct that you’re getting the error because the string contains an 'H' character that can’t be converted to int.

Another student had the same error and it turned out to be a problem with how the ask_posts list was generated in a previous cell. Have a look at this post and see if it helps.

If not, it would be helpful to see your project notebook so others in the community can help with troubleshooting. (When I click the link to the mission, it takes me to my own code, not yours.) Here is a tutorial on how to share your project.

1 Like

Hi @april.g, thanks for the explanation, I can see the problem now.

Also thanks for the tutorial :slightly_smiling_face:

Also, @april.g, when you said ‘exploring the list’, what steps do you mean by that? I’m guessing you don’t just print out the list and have a look through as it’s rather lengthy…

Right, printing out the whole list wouldn’t be too helpful! You can just check the first 5 entries with print(your_list[:5]) to get a feel for what the list looks like. You will do this a lot as you’re working to either make sure your code worked as expected, or to figure out what went wrong.

2 Likes

I also made this silly mistake and spent a lot time figuring out a reason. Thanks for this @april.g We gotta be more careful moving forward. haha!

1 Like

Hello, I am still stuck when I’m trying to print len(ask posts), len(shown posts) and len(other posts). My output is 0, 0 and 4 respectively which is quite different from the solution provided. My code looks similar to the code in the solution. What could I be doing wrong?

See attached screenshot.

Thanks

1 Like

Hi @jamoko77.poo. There’s probably something else going on from earlier in the code that’s causing this issue. I have a couple ideas to start with:

  1. If you’ve left and come back to a project, make sure you’ve rerun all the cells since the kernel was restarted.
  2. Check to make sure you didn’t accidentally overwrite the hn dataframe at the beginning when you’re separating the headers from the rest of the data.

If you’re still stuck, it might be easiest if you either put in the code you have so far, or attach a copy of your .ipynb file so that we can have a look at the notebook to help diagnose.

Hi there,

Maybe I’m running into the same issue. Getting a: AttributeError: 'str' object has no attribute 'startwith'
error.

Notebook:
Basics (5).ipynb (8.2 KB)

Click here to view the jupyter notebook file in a new tab

ask_posts =
show_posts =
other_posts =
for row in hacker:
title = row[1]
if title.startswith(‘Ask HN’):
ask_posts.append(row)
elif title.startswith(‘Show HN’):
show_posts.append(row)
else:
other_posts.append(row)
print(ask_posts,show_posts)

the solution

it was suppose to ‘Ask HN’ what they said was ‘ask hn’
you can go through the data to comform

Hi @ariwatec,

Instructions
Implement the following steps:

  • If the lowercase version of title starts with ask hn , append the row to ask_posts .
  • Else if the lowercase version of title starts with show hn , append the row to show_posts .

Solution

if title.lower().startswith("ask hn"):
    ask_posts.append(post)
elif title.lower().startswith("show hn"):
    show_posts.append(post)

Here, we are converting them to lower case using title.lower() because some of the titles don’t start with "Ask HN". To be specific, 2 of them start with "ASK HN".

Best,
Sahil

Hi @jamoko77.poo,

I have similar problem. I think our .csv file is different. I have inspected the first column, there is no ‘ask hn’ / ‘Ask HN’ and ‘show hn’ / ‘Show HN’ at all.


Anyone can help?
cc: @Sahil @nityesh

Thank you for any help guys.

Hi @rpmuayyad,

Try running this code:

ask_hn_posts = []

for post in hn:
    title = post[1]
    if title.lower().startswith("ask hn"):
        ask_hn_posts.append(post)
        
print(ask_hn_posts[:5])

Output:

[
['12296411', 'Ask HN: How to improve my personal website?', '', '2', '6', 'ahmedbaracat', '8/16/2016 9:55'],
['10610020', 'Ask HN: Am I the only one outraged by Twitter shutting down share counts?', '', '28', '29', 'tkfx', '11/22/2015 13:43'],
['11610310', 'Ask HN: Aby recent changes to CSS that broke mobile?', '', '1', '1', 'polskibus', '5/2/2016 10:14'],
['12210105', 'Ask HN: Looking for Employee #3 How do I do it?', '', '1', '3', 'sph130', '8/2/2016 14:20'],
['10394168', 'Ask HN: Someone offered to buy my browser extension from me. What now?', '', '28', '17', 'roykolak', '10/15/2015 16:38']
]

Best,
Sahil

Hi @Sahil,

Thank you for your reply.
I have tried your code but did’t get the result.

Best,

2 Likes

Hi @rpmuayyad,

Can you please share with us the jupyter notebook file (.ipynb)? I will take a look.

Best,
Sahil

Hi @Sahil,

Yes. Please have a look at my code.
Basics.ipynb (5.1 KB)
Thank you very much.

Best,
Raden

Click here to view the jupyter notebook file in a new tab

1 Like

Hi @rpmuayyad,

Please change hn = list(opened_file) to hn = list(readed_file) and run your code from start to finish.

Best,
Sahil

1 Like

Hi @Sahil,

Wow! That’s impressive!
Thank you for finding it out.

Best,
Raden

1 Like

Hello,

I’m having issues calculating the counts_by_hour and comments_by_hour. I’m not sure what I’m doing wrong. I haven’t completed the markdown part of the project, but instead working my way through the code. Any advice would be helpful.

Basics.ipynb (6.1 KB)

Click here to view the jupyter notebook file in a new tab