LIMITED TIME OFFER: 50% OFF OF PREMIUM WITH OUR ANNUAL PLAN (THAT'S $294 IN SAVINGS).
GET OFFER

Hacker News Mission 5 - Need Help

Screen Link: https://app.dataquest.io/m/356/guided-project%3A-exploring-hacker-news-posts/1/introduction

Your Code: ‘’'import datetime as dt

result_list =
for row in ask_posts:
created_at = row[6]
n_comments = row[4]
result_list.append([created_at, n_comments])

counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
time = row[0]
date_time = dt.datetime.strptime(time, “%m/%d/%Y %H:%M”)
hour = date_time.strftime("%H")
if hour not in counts_by_hour:
counts_by_hour[hour] = 1
comments_by_hour[hour] = int(row[1])
else:
counts_by_hour[hour] += 1
comments_by_hour[hour] += int(row[1])’’’

What I expected to happen:
I would create the frequency tables so that I could then calculate my averages

What actually happened: I am getting many errors and need some suggestions from the community.
‘’'ValueErrorTraceback (most recent call last)
in ()
12 for row in result_list:
13 time = row[0]
—> 14 date_time = dt.datetime.strptime(time, “%m/%d/%Y %H:%M”)
15 hour = date_time.strftime("%H")
16 if hour not in counts_by_hour:

/usr/lib/python3.4/_strptime.py in _strptime_datetime(cls, data_string, format)
498 “”“Return a class cls instance based on the input string and the
499 format string.”""
–> 500 tt, fraction = _strptime(data_string, format)
501 tzname, gmtoff = tt[-2:]
502 args = tt[:6] + (fraction,)

/usr/lib/python3.4/_strptime.py in _strptime(data_string, format)
335 if not found:
336 raise ValueError("time data %r does not match format r"
–> 337 (data_string, format))
338 if len(data_string) != found.end():
339 raise ValueError("unconverted data remains: s"

ValueError: time data ‘:’ does not match format ‘%m/%d/%Y %H:%M’’’’

Other details:

The part of the error message that will help us the most is at the end: ValueError: time data ':' does not match format '%m/%d/%Y %H:%M'. It seems like what’s happening is that time = row[0] has the value ':' instead of the date string we’re expecting.

Just looking at the code here, I can’t spot the problem. First thing I would do is check in another cell what result_list[:5] looks like to see if the first few rows at least look like what we’re expecting with the formatting (['8/16/2016 9:55', 6]). If that part checks out then perhaps one of the entries has an error, but I’m not sure where or how that would happen. If it doesn’t check out, then something might have happened to ask_posts on accident, so it would be good to inspect it as well in the same way.

It might help the most if you could share your notebook so that others can get into the code and have a closer look at what’s going on.

I see I can’t attach a file here. How would I go about sharing the notebook in teh forum?

Hmm, I thought you should be able to see this button in the post editor window of the forum:
image

:thinking: If you’re not able to upload the file here, do you have something like a Google Drive or GitHub where you can share files? It’s the only other way I can think of aside from copying and pasting all the code in the forum itself (which because of formatting issues really isn’t ideal).

Great tip. I have put the notebook here in GitHub.

It is the only file there. thanks!

looks like my issue is with the ask_posts list of lists. There is data missing there.

1 Like

Thanks a lot, it really helped to see all the code to try to figure out what was going on! I was able to figure it out, and it starts toward the beginning in this section of code:

ask_posts = []
show_posts = []
other_posts = []
for row in hn:
    title = row[1].lower()
    if title.startswith('ask hn'):
        ask_posts.append(title)          #here
    elif title.startswith('show hn'):
            show_posts.append(title)     #here
    else: 
        other_posts.append(title)        #and here

When you execute print(ask_posts[:2]), these are the results:

['ask hn: how to improve my personal website?', 'ask hn: am i the only one outraged by twitter shutting down share counts?']

What we want to see is not just the titles of the articles, but all the data in the row of those titles so that later we can access the number of comments, dates, etc. Instead of appending title for each of these, you want to append the row.

The reason that the code seemed to be working and didn’t throw up an error earlier is because of how the code was written in the cell where you calculated the averages:

total_ask_comments = 0
for comments in ask_posts:                   #comments is the iteration variable
    num_comments = row[4]                    #but where did row come from?
    num_comments = int(num_comments)
    total_ask_comments += num_comments
    avg_ask_comments = total_ask_comments / len(ask_posts)

#I didn't want to copy the whole thing

Since row was used instead of comments, the loop never actually goes through ask_posts except for how many times it will loop. Instead, it’s accessing, whatever was the last row when you used it in the previous loop. On every iteration, row looks like this:

['11680777', 'RoboBrowser: Your friendly neighborhood web scraper', 'https://github.com/jmcarp/robobrowser', '182', '58', 'pmoriarty', '5/12/2016 1:43']

And so row[4] is always 58, which made the ask_post average 58. (show_posts is not 58 because of a typo, I’ll let you sort it out :slight_smile: )

I hope that helps!

Huge help, thank you. The point about going back and checking the list of list earlier in the code is a great tip. As soon as I did that I could see my error was upstream in the code. I will use that technique going forward.

Strange I messed up the iteration variable. I almost always use row so I don’t create this type of confusion.

1 Like