Guided Project: Exploring Hacker News Posts Average Number of Posts differs from guide

Hi i am afraid i am not getting the right Average number of posts. Also doesn’t appear to have any patterns on the average number of posts, or am i missing something out?

in this link here,
https://app.dataquest.io/m/356/guided-project%3A-exploring-hacker-news-posts/7/sorting-and-printing-values-from-a-list-of-lists

the average number of posts is shown like the below

[
[‘09’, 5.5777777777777775],
[‘13’, 14.741176470588234],
[‘10’, 13.440677966101696],
[‘14’, 13.233644859813085],
[‘16’, 16.796296296296298],
[‘23’, 7.985294117647059],
[‘12’, 9.41095890410959],
[‘17’, 11.46],
[‘15’, 38.5948275862069],
[‘21’, 16.009174311926607],
[‘20’, 21.525],
[‘02’, 23.810344827586206],
[‘18’, 13.20183486238532],
[‘03’, 7.796296296296297],
[‘05’, 10.08695652173913],
[‘19’, 10.8],
[‘01’, 11.383333333333333],
[‘22’, 6.746478873239437],
[‘08’, 10.25],
[‘04’, 7.170212765957447],
[‘00’, 8.127272727272727],
[‘06’, 9.022727272727273],
[‘07’, 7.852941176470588],
[‘11’, 11.051724137931034]
]

however, what i get is

[[‘21’, 7.339449541284404],
[‘15’, 0.8620689655172413],
[‘07’, 2.941176470588235],
[‘05’, 4.3478260869565215],
[‘18’, 182.56880733944953],
[‘17’, 5.0],
[‘13’, 15.294117647058824],
[‘09’, 4.444444444444445],
[‘20’, 11.25],
[‘08’, 4.166666666666666],
[‘02’, 10.344827586206897],
[‘00’, 27.27272727272727],
[‘04’, 4.25531914893617],
[‘12’, 4.10958904109589],
[‘14’, 16.822429906542055],
[‘23’, 2.941176470588235],
[‘03’, 1.8518518518518516],
[‘22’, 1.4084507042253522],
[‘11’, 50.0],
[‘16’, 1.8518518518518516],
[‘06’, 50.0],
[‘19’, 1.8181818181818181],
[‘01’, 6.666666666666667],
[‘10’, 1.694915254237288]]

i think it may be due to my calculation of Average

avg_by_hour =

for row in counts_by_hour:
    temp_comments = []
    hour = row
    no_posts = counts_by_hour[row]
    no_comments = comments_by_hour[row]
    avg_hour = int(no_comments)/int(no_posts)
    temp_comments = [hour,100*avg_hour]
    avg_by_hour.append(temp_comments)

However, when i look at the the previous Number of Posts and Number of Comments, it also looks like there is no discernible pattern there?

even if i were to discount the absolute numbers, i can see that the hours of where the peak is for the Guide and mine differs at different point.

Anyone else facing the same problem as i do?
Guided Project Exploring Hacker News Posts.ipynb (20.9 KB)

2 Likes

In the snippet of the code below, I think there’s a typo in the else portion.

for row in result_list:
    dthour = row[0]
    dthour = dthour. strftime("%H")
    if dthour not in counts_by_hour:
        counts_by_hour[dthour] = 1
        comments_by_hour[dthour] = row[1]
    else:
        counts_by_hour[dthour] += 1
        comments_by_hour[dthour] = row[1]

The last line should probably be comments_by_hour[dthour] += row[1] to update the comments_by_hour dictionary.

In the code for the average, I can’t figure out why the avg_by_hour is being multiplied by 100. When I edited the code for the comments_by_hour and took out the 100*, I get the same results you see in the guided project solution.

I hope that helps.

3 Likes

oh thanks. can’t figure out what was I thinking about when i made that typo.

the 100 was when i was trying to force the number to be as close to the Guide, as otherwise it is very small. thank you so much April :grinning:

Hello Everyone,
I got stuck with this code. however what i found the code marked in red is working for me & giving appropriate result. But still i am not able to figure out how exactly this code work. Can anyone please explain it in details.

Regards

1 Like

Hi @deodattatijare,
This is my version of the code in given here in red.

avg_by_hour = []                                       # Creating an empty list to add avg comments per post

for key in hourly_comment:                                # Iterating through each key of the dictionary
    total_comments = hourly_comment[key]                    # Assigning comments using dictionary[key]
    number_of_posts = hourly_post[key]                        # Assigning number of posts using dictionary[key]
    
    avg_comment_per_post = total_comments/number_of_posts        # finding the average
    
    avg_by_hour.append([avg_comment_per_post, key])                  # Appending the average to the list
    
    
print("The list of Average comments received per post in each hour is below \n \n", avg_by_hour)


So I assume that you have created the dictionary to store the number of comments per hour. Here it is saved in hourly_comment dictionary.

So that dictionary looks like this

{‘19’: 2513, ‘15’: 17124, ‘09’: 832, ‘20’: 2530, ‘17’: 3968, ‘14’: 3639, ‘11’: 1630, ‘23’: 1261, ‘13’: 5980, ‘02’: 2022, ‘21’: 2997, ‘16’: 3001, ‘07’: 1037, ‘06’: 949, ‘00’: 1372, ‘03’: 1403, ‘04’: 1611, ‘22’: 2336, ‘10’: 2213, ‘12’: 3234, ‘18’: 3222, ‘08’: 1639, ‘01’: 1232, ‘05’: 1139}

You can see the key:value pair of the dictionary, for example ‘19’: 2513.
Here 19 is the key and 2513 is the value.

When you type dictionary[key] it returns the value, here in this case my dictionary hourly_comment[19] would return 2513

Similarly we have created another dictionary to store number of Posts in each hour. In the above code it was saved under hourly_post dictionary. In this dictionary also the key will be the same. They will be the hours.

{‘19’: 62, ‘15’: 103, ‘09’: 18, ‘20’: 52, ‘17’: 58, ‘14’: 66, ‘11’: 33, ‘23’: 39, ‘13’: 64, ‘02’: 37, ‘21’: 51, ‘16’: 62, ‘07’: 26, ‘06’: 31, ‘00’: 29, ‘03’: 38, ‘04’: 25, ‘22’: 50, ‘10’: 44, ‘12’: 55, ‘18’: 69, ‘08’: 29, ‘01’: 32, ‘05’: 18}

This is hourly_post dictionary.

Now you can see that there is a value corresponding to 19, which is 62.

So when we are running the for loop using the key in any one of these dictionaries, we can use the same key to access the values attached to it in both of these dictionaries.

So 19 in hourly_comment would return 2513 no. of comments per hour
19 in hourly_post would return 62 post per hour

When you divide these two, you get average comments per post in each hour. This value is stored in avg_comment_per_post

We need to add these value to the empty list that we have created earlier avg_by_hour. Now we are going to add both the hour and average to this list as another list. Hour value is same as the key and Average is stored at avg_comment_per_post

So you can either use [key, avg] or [avg,key] list to append to the list. I used [Avg, key] sequence so that it is possible to sort the list in the ascending order of average values.

Thus we have a list of list having hour values and average against it.

I hope this helps. Let me know if I didn’t address your actual problem!