Guided Project_Exploring Hacker News Posts


I am sharing my project - Guided Project_Exploring Hacker News Posts - for review, please kindly check. Actually I had completed this project in Nov 2019, I am posting it right now just to receive your valuable feedback.
Guided Project_ Exploring Hacker News Posts.ipynb (183.6 KB)

Click here to view the jupyter notebook file in a new tab


Thanks for sharing your project, Nisrin! The code looks great and I like that you went beyond the project to try to find the answer to the timezone question. I’ve never seen the library you used and it inspired me to go check it out.

Your introduction introduces the questions that you’re hoping to answer from the dataset. You’ll want to revisit that with a conclusion at the end that brings everything all together. Also, don’t be afraid to use Markdown cells to explain what you’re doing. When you prepare the project for presentation, you’ll probably want to edit out the additional print statements in the loop (cell 11 with the counts and comments by hour dictionaries).

Keep up the good work! :slight_smile:



Thanks for your valuable feedback.

I am learning so I too didn’t knew about pytz, I learnt on Google. I am not sure that the conversion output is right. If you check-out then please do give a feedback about this code and its output. Because I calculated and found it should be 12:30am IST time and I am getting 1:49am IST.

Thanks once more, I have noted all the valuable guidance provided, and I will implement it in all my future projects.

Okay, so I spent some time playing around with this. For some reason, there’s some weird issue that revolves around the default date of 1900-01-01 when we convert the hour in sorted_swap to a datetime object.

dt.datetime.strptime('15', '%H')
datetime.datetime(1900, 1, 1, 15, 0)

We don’t see the date though because we go ahead and use .strftime('%H:%M') to get the hours/minute.

When we use the object with localize() in the est_time line, it has a weird problem with the minutes in the timezone (I tried other time zones too and it was the same problem).

print(timezone('US/Eastern').localize(dt.datetime.strptime('15', '%H')))
# output:
1900-01-01 15:00:00-04:56   # should say 15:00:00-05:00

Notice the difference if we run the same code but put in a specific date with the time 15:

print(timezone('US/Eastern').localize(dt.datetime(2000, 1, 1, 15)))
# output
2000-01-01 15:00:00-05:00

So what I did to try to fix the issue was create the datetime object with a different year by putting in a date and converting lst to an integer (because that’s what we need to use datetime()). I tried with the year 1950 and it worked okay, so it’s probably something buggy with the year 1900.

Here is the code I used with the workaround described. It gave the expected output (the time is an hour different, 01:30 IST vs 00:30 IST, because it’s using Eastern Standard Time and not Eastern Daylight Time).

for lst in sorted_swap[:5]:
    hour = dt.datetime(2000, 1, 1, int(lst[1]))         #any date is fine, it doesn't matter since we don't use it
    comment_avg = lst[0]
    print(outputsentence.format(hr = hour.strftime('%H:%M'), avg = comment_avg))
    est_time = timezone('US/Eastern').localize(hour)
    ist_time = est_time.astimezone(timezone('Asia/Calcutta')).strftime('%H:%M')
    print(outputsentencea.format(hr = ist_time, avg = comment_avg))


15:00 in Eastern Time in the US: 38.59 average comments per post
01:30 in IST: 38.59 average comments per post
02:00 in Eastern Time in the US: 23.81 average comments per post
12:30 in IST: 23.81 average comments per post
20:00 in Eastern Time in the US: 21.52 average comments per post
06:30 in IST: 21.52 average comments per post
16:00 in Eastern Time in the US: 16.80 average comments per post
02:30 in IST: 16.80 average comments per post
21:00 in Eastern Time in the US: 16.01 average comments per post
07:30 in IST: 16.01 average comments per post