Guided Project: Exploring Hacker News Posts, Step 6 Calculating the Average

https://app.dataquest.io/m/356/guided-project%3A-exploring-hacker-news-posts/6/calculating-the-average-number-of-comments-for-ask-hn-posts-by-hour

‘’’ import datetime as dt

result_list =

for post in ask_posts:
create = post[6]
comments = post[4]
result_list.append([create,comments])

#print(result_list)

counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
hour = row[0]
comments = int(row[1])
dt_hour = dt.datetime.strptime(hour, ‘%m/%d/%Y %H:%M’ )
dt_str_hour = dt_hour.strftime("%H")
if dt_str_hour not in counts_by_hour:
counts_by_hour[dt_str_hour] = 1
comments_by_hour[dt_str_hour] = comments
else:
counts_by_hour[dt_str_hour] += 1
comments_by_hour[dt_str_hour] += comments

print(‘Count of posts by hour:’, counts_by_hour)
print(’\n’)
print(‘Number of comments by hour:’, comments_by_hour)
‘’’
‘’’
avg_comments_hour =

for comment in comments_by_hour:
for count in counts_by_hour:
if count == comment:
avg_comments_hour.append([comment, round(comments_by_hour[comment]/counts_by_hour[count],2)])
else:
continue

print(‘Average comments per hour on Ask Hn posts:’, ‘\n’, avg_comments_hour)
‘’’
What I expected to happen:

[
[‘09’, 5.5777777777777775],
[‘13’, 14.741176470588234],
[‘10’, 13.440677966101696],
[‘14’, 13.233644859813085],
[‘16’, 16.796296296296298],
[‘23’, 7.985294117647059],
[‘12’, 9.41095890410959],
[‘17’, 11.46],
[‘15’, 38.5948275862069],
[‘21’, 16.009174311926607],
[‘20’, 21.525],
[‘02’, 23.810344827586206],
[‘18’, 13.20183486238532],
[‘03’, 7.796296296296297],
[‘05’, 10.08695652173913],
[‘19’, 10.8],
[‘01’, 11.383333333333333],
[‘22’, 6.746478873239437],
[‘08’, 10.25],
[‘04’, 7.170212765957447],
[‘00’, 8.127272727272727],
[‘06’, 9.022727272727273],
[‘07’, 7.852941176470588],
[‘11’, 11.051724137931034]
]

What actually happened: wrap your code in triple backticks to format properly

[[‘02’, 11.14], [‘01’, 7.41], [‘22’, 8.8], [‘21’, 8.69], [‘19’, 7.16], [‘17’, 9.45], [‘15’, 28.68], [‘14’, 9.69], [‘13’, 16.32], [‘11’, 8.96], [‘10’, 10.68], [‘09’, 6.65], [‘07’, 7.01], [‘03’, 7.95], [‘23’, 6.7], [‘20’, 8.75], [‘16’, 7.71], [‘08’, 9.19], [‘00’, 7.56], [‘18’, 7.94], [‘12’, 12.38], [‘04’, 9.71], [‘06’, 6.78], [‘05’, 8.79]]

Other details:

My average for comments by hour is off. The math from doing number of comments divided count of posts is correct when referencing my two dictionaries.

I did some digging, my asks_posts list is 9,139 rows in length.

The number of comments from those 9,139 rows is equal to 94,986. The total sum of values in my comments by_hour dictionary is equal to 94,986.

The total sum of values in my counts_by_hour dictionary is equal to 9,139.

However, my results differ from the expected output of average comments by hour shown on step 7. I can’t tell where I went wrong. Help would be appreciated!

Looking at this thread: Guided Project: Exploring Hacker News Posts Average Number of Posts differs from guide.

My average comments by hour are the same as his after April G.'s corrections are implemented. She said that the average is correct. However, they are different from the solution provided?

Are you by chance using the original dataset from Hacker News? I’m suspecting that this is why you’re seeing the difference. (I downloaded it and reran my code, and I’m getting the same results you are.)

The dataset being used in the solution guide (and in the platform) is different than the one you can download directly from Hacker News:

You can find the data set here, but note that it has been reduced from almost 300,000 rows to approximately 20,000 rows by removing all submissions that did not receive any comments, and then randomly sampling from the remaining submissions.

If you download the dataset being used on the platform (the Download button above the Jupyter notebook in the mission), I suspect you’ll see the same results as the solution notebook.

Yes, I’m using the dataset from Kaggle. I didn’t see the download button or solution notebook above the the Jupyter notebook in the mission.

I’ll reference those.

Thank you