Python Guided Project 2, Step 5: Output doesn't match up

Screen Link:

My Code:

import datetime as dt

result_list = []

for row in ask_posts:
    created_at = row[6]
    num_comments = int(row[4])
    result_list.append([created_at, num_comments])
    
counts_by_hour = {}
comments_by_hour = {}
date_format = "%m/%d/%Y %H:%M"

for row in result_list:
    post_time = row[0]
    comments = row[1]
    dt_object = dt.datetime.strptime(post_time, date_format)
    post_hour = dt_object.strftime("%H")
    if post_hour not in counts_by_hour:
        counts_by_hour[post_hour] = 1
        comments_by_hour[post_hour] = comments
    if post_hour in counts_by_hour:
        counts_by_hour[post_hour] += 1
        comments_by_hour[post_hour] += comments
        
print(comments_by_hour)

The output that I received (below) differs slightly from the output given in the solution notebook. However, my code is relatively similar to the solution code and all of my numbers match up until this step (averages etc.) so I’m not too sure what’s gone wrong, here.

{'09': 257, '13': 1282, '10': 794, '14': 1419, '16': 1831, '23': 544, '12': 691, '17': 1147, '15': 4478, '21': 1749, '20': 1724, '02': 1384, '18': 1441, '03': 422, '05': 493, '19': 1191, '01': 716, '22': 481, '08': 497, '04': 340, '00': 457, '06': 398, '07': 269, '11': 643}

Below is the output given by the solution notebook, for reference.

{'09': 251,
 '13': 1253,
 '10': 793,
 '14': 1416,
 '16': 1814,
 '23': 543,
 '12': 687,
 '17': 1146,
 '15': 4477,
 '21': 1745,
 '20': 1722,
 '02': 1381,
 '18': 1439,
 '03': 421,
 '05': 464,
 '19': 1188,
 '01': 683,
 '22': 479,
 '08': 492,
 '04': 337,
 '00': 447,
 '06': 397,
 '07': 267,
 '11': 641}
1 Like

Hi @colleen.mccaskell,

The issue with your code is exactly in the order of if-statements:

if post_hour not in counts_by_hour:
       counts_by_hour[post_hour] = 1
       comments_by_hour[post_hour] = comments
   if post_hour in counts_by_hour:
       counts_by_hour[post_hour] += 1
       comments_by_hour[post_hour] += comments

Practically, for each iteration of your for-loop you add the values to the dictionaries in case they were not already there, and then, in the second if-statement (when those missing values are already present in the dictionaries) you again add +1 (or +comments) again. That’s why all the values in your output are always greater than those in the solution.

I suggest you or to change the order of the if-statements (then the first check will be if the values IS in the dictionary, and only then if it IS NOT), or to change the second if-statement with ‘else’.

Hope it was useful.

Thanks, Elena. I see now that once the hour is added to the dictionary, it satisfies the conditions of the second if statement, so it makes sense that these values are being double-counted. I’ve now changed the second statement to ‘else’ and the numbers all reconcile.

Appreciate the explanation!

1 Like

Where can I find the solution to the notebook

Hi @chautran0729 this is the github repo of recommended solutions. To find the solution for the specific mission. Click on the id of the mission, in this case 356 as per the mission link: https://app.dataquest.io/m/356/guided-project:-exploring-hacker-news-posts/5/finding-the-amount-of-ask-posts-and-comments-by-hour-created

1 Like