Guided Project: Exploring Hacker News Posts step 7 sorting and printing values

so, it displays what the output of step 6 should be; a variety of average comments by hour.
my code instead displayed .5 for all averages, which is startlingly wrong.

Basics (1).ipynb (10.2 KB)

edit: so after i had the total comments per category printed, i should be distributing out 24000 comments, averaging 1000 comments per hour.

but both count by hour and count by post are getting a couple hundred

def com_count(post_section):
    total_comments = 0
    for row in post_section:
        com_count = int(row[4])
        total_comments += com_count
        avg_comments = round(total_comments / len(post_section),4)
    print(avg_comments)
    print(total_comments)

output;
The average and total number of comments on Ask HN posts is:
14.0315
24485

The average and total number of comments on Show HN posts is:
10.3021
12002

import datetime as dt
result_list = []
for row in ask_posts:    
    created_at = row[6]
    comments = int(row[4])
    result_list.append([created_at, comments])
    counts_by_hour = {}
    coms_by_hour = {}
    for row in result_list:
        dt_string= row[0]
        #m/d/yyyy 24:mm
        dt_object = dt.datetime.strptime(dt_string, '%m/%d/%Y %H:%M')
        post_hour = dt_object.strftime('%H')
        if post_hour not in counts_by_hour:
            counts_by_hour[post_hour] = 1
            coms_by_hour[post_hour] = comments
        else:
            counts_by_hour[post_hour] += 1
            coms_by_hour[post_hour] += comments
print(counts_by_hour)
print(lb)
print(coms_by_hour)

{β€˜06’: 44, β€˜18’: 109, β€˜05’: 46, β€˜11’: 58, β€˜23’: 69, β€˜21’: 109, β€˜09’: 45, β€˜14’: 107, β€˜20’: 80, β€˜10’: 59, β€˜08’: 48, β€˜15’: 116, β€˜13’: 85, β€˜12’: 73, β€˜02’: 58, β€˜01’: 60, β€˜16’: 108, β€˜17’: 100, β€˜07’: 34, β€˜22’: 71, β€˜19’: 110, β€˜00’: 55, β€˜03’: 54, β€˜04’: 47}

{β€˜06’: 88, β€˜18’: 218, β€˜05’: 92, β€˜11’: 116, β€˜23’: 138, β€˜21’: 218, β€˜09’: 90, β€˜14’: 214, β€˜20’: 160, β€˜10’: 118, β€˜08’: 96, β€˜15’: 232, β€˜13’: 170, β€˜12’: 146, β€˜02’: 116, β€˜01’: 120, β€˜16’: 216, β€˜17’: 200, β€˜07’: 68, β€˜22’: 142, β€˜19’: 220, β€˜00’: 110, β€˜03’: 108, β€˜04’: 94}

so the total comments accounted for total to 3490, out of the desired 24,485

It looks like the problem is in the part of the loop for row in result_list:. I was able to get the right numbers when I added the line comments = row[1]. I think what’s happening is that the original code was using the comments = int(row[4]) from the other loop, which didn’t match up with the number of comments in the row of result_list. Adding in comments = row[1] fixed the issue.

As an aside, I found that the cell ran really slowly. I pulled the row in result_list loop out of the first one. I got the same results but the code ran much faster.

import datetime as dt
result_list = []
for row in ask_posts:
    created_at = row[6]
    comments = int(row[4])
    result_list.append([created_at, comments])
counts_by_hour = {}
coms_by_hour = {}
for row in result_list:
    dt_string= row[0]
    comments = row[1]
    #m/d/yyyy 24:mm
    dt_object = dt.datetime.strptime(dt_string, '%m/%d/%Y %H:%M')
    post_hour = dt_object.strftime('%H')
    if post_hour not in counts_by_hour:
        counts_by_hour[post_hour] = 1
        coms_by_hour[post_hour] = comments
    elif post_hour in counts_by_hour:
        counts_by_hour[post_hour] += 1
        coms_by_hour[post_hour] += comments
print(counts_by_hour)
print(lb)
print(coms_by_hour)
2 Likes

Oh, also, I looked at the avg_by_hour cell. You needed to divide the number of comments by the hour counts (you had it reversed).

avg_by_hour = []
for hour in counts_by_hour:
    avg_by_hour.append([hour, (coms_by_hour[hour] / counts_by_hour[hour])])
print(avg_by_hour)
1 Like

April.g, tried to do the avg_by_hour code that you put in here…but the results are:

`counts_by_hour = {}
comments_by_hour = {}
for result in result_list:
#print(result)
date_obj = dt.strptime(result[0],"%m/%d/%Y %H:%M")
#print(date_obj)
date_obj = dt.strftime(date_obj,"%H")
#print(date_obj)
if date_obj not in counts_by_hour:
counts_by_hour[date_obj]=1
comments_by_hour[date_obj]=A2
else:
counts_by_hour[date_obj] += 1
comments_by_hour[date_obj] += A2

print (counts_by_hour)
print (comments_by_hour)

avg_comment =
for hour in counts_by_hour:
avg_by_hour = avg_comment.append([hour, comments_by_hour[hour]/counts_by_hour[hour]])

RESULT: [[β€˜10’, 2.0], [β€˜22’, 2.0], [β€˜05’, 2.0], [β€˜09’, 2.0], [β€˜18’, 2.0], [β€˜17’, 2.0], [β€˜02’, 2.0], [β€˜19’, 2.0], [β€˜13’, 2.0], [β€˜03’, 2.0], [β€˜21’, 2.0], [β€˜15’, 2.0], [β€˜01’, 2.0], [β€˜14’, 2.0], [β€˜12’, 2.0], [β€˜00’, 2.0], [β€˜16’, 2.0], [β€˜04’, 2.0], [β€˜07’, 2.0], [β€˜20’, 2.0], [β€˜23’, 2.0], [β€˜06’, 2.0], [β€˜08’, 2.0], [β€˜11’, 2.0]]

Hi Anish. I’m not sure where the trouble is at first glance because I’m not able to run your code properly. I don’t know what the A2 variable means here and I don’t want to assume anything. However, when I changed A2 to result[1] I get the correct results for the averages. If changing that doesn’t work for you, then the issue might be at an earlier step. In that case, it would be easier to have a look at your notebook file.

1 Like

Hi April, thank you for this. I will try and send you the workbook file as it keeps resetting and I’ve got to go all the way back to the beginning.