STEP 1 ( course 2/2) . Guided Project: Exploring Hacker News Posts

https://app.dataquest.io/m/356/guided-project%3A-exploring-hacker-news-posts/5/finding-the-amount-of-ask-posts-and-comments-by-hour-created

import datetime as dt

result_list =
for row in ask_posts:
temp=
temp.append(row[6])
temp.append(int(row[4]))
result_list.append(temp)
print(result_list[:6])

counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
temp = dt.datetime.strptime(row[0],"%m/%d/%Y %H:%M")
temp = temp.strftime("%H")
if temp not in counts_by_hour:
counts_by_hour[temp]=1
comments_by_hour[temp] = row[1]
else:
counts_by_hour[temp]+=1
comments_by_hour[temp]+= row[1] β€œβ€"""

ValueError: time data β€˜09/12/16 23:57’ does not match format β€˜%m/%d/%Y %H:%M’

no result if changed from %Y to %y
if use %y similar error happened as " ValueError: time data β€˜9/26/2016 2:53’ does not match format β€˜%m/%d/%y %H:%M’ "

" in csv data file, some rows are in " mm/dd/yyyy" format and others are as β€œmm/dd/yy”

Pls help to solve above error

Sorry sir.in your code β€œtemp = dt.datetime.strptime(row[0],”%m/%d/%Y %H:%M")"
row[0] is mean first character from Date you should try print row[0] to show data before you use it.

Thanks a lot
I checked it, it’s saved as list of list.
Problem is in CSV data file, some rows are as mm/dd/ yyyy and others as mm/dd/yy format

I am also having the same issue! I’m using the same exact strptime date & time format as you but I’m also getting an error that it doesn’t match the format. I’d appreciate any help on this as well!

option #1 : use below code
import datetime as dt
result_list =
for row in ask_posts:
created_at = row[6]
num_comments = int(row[4])
result_list.append((created_at, num_comments))
result_list[:10]
output:
[(β€˜9/26/2016 2:53’, 7),
(β€˜9/26/2016 1:17’, 3),
(β€˜9/25/2016 22:57’, 0),
(β€˜9/25/2016 22:48’, 3),
(β€˜9/25/2016 21:50’, 2),
(β€˜9/25/2016 19:30’, 1),
(β€˜9/25/2016 19:22’, 22),
(β€˜9/25/2016 17:55’, 3),
(β€˜9/25/2016 15:48’, 0),
(β€˜9/25/2016 15:35’, 13)]

if you got same output as above then only go ahead else use below link and download data file and re-start.
https://www.kaggle.com/mcarn096/exploring-hacker-news-posts/data

option# 2. got message from dataquest support team as follow:

Hey Umesh!

This process becomes much, much simpler when you advance to the pandas missions, but in this case, you’ll need to do some pretty wacky string manipulation to get the dates to the format you want them to be in.

My suggestion, instead, would be to import the parser module from the dateutil library and use the .parse() method instead of .strptime(). It would look something like this:

from dateutil import parser

for row in result_list:

  temp = parser.parse(row[0])

  temp = temp.strftime("%H")

=================================

hope above works !

option 1

import datetime as dt
result_list =
for row in ask_posts:
create_at = row[6]
num_comment = int(row[4])
result_list.append([create_at,num_comment])
print(result_list[:6])
///// result (this is list of list)
[[β€˜8/16/2016 9:55’, 6],
[β€˜11/22/2015 13:43’, 29],
[β€˜5/2/2016 10:14’, 1],
[β€˜8/2/2016 14:20’, 3],
[β€˜10/15/2015 16:38’, 17],
[β€˜9/26/2015 23:23’, 1]]

option 2

counts_by_hour = {}
comments_by_hour = {}
for row in result_list:
date = dt.datetime.strptime(row[0],"%m/%d/%Y %H:%M")
hour = date.strftime("%H")
if hour in counts_by_hour:
counts_by_hour[hour] += 1
comments_by_hour[hour] += row[1]
else:
counts_by_hour[hour] = 1
comments_by_hour[hour] = row[1]

this is not problem in data check step you code.

Hey! Thanks for your helpful response.
I tried using your code β€œoption 2”, and it looked similar (though not the same) to mine, but is experiencing the same problem. When it tried to parse the datetime format from row[0], it’s only calling the first character for me, which is β€œ8”, from a datetime stamp of β€˜8/16/2016 9:55’. So then I need to make sure it calls the entire time stamp rather than the month (8).

If I change the code to:
date = dt.datetime.strptime(row[0:],"%m/%d/%Y %H:%M") then it gives me an error saying that type int is not subscriptable.

pls re check,
date = dt.datetime.strptime(row[0:],"%m/%d/%Y %H:%M") . it should be
row[0] , not row[0:]
uploaded herewith my coding file, may be helpful to you.

hacker_News_Post1.ipynb (32.3 KB)

Click here to view the jupyter notebook file in a new tab

1 Like

yes, its list of list , now i got it. thanks a lot

Thank you very much! It turns out my error was actually in the previous cell, where I appended both created_at and n_comments, though I don’t get the difference and usability between the two. Thank you again!

Thanks a lot for a provided solution. I did mostly on my own but in some cases it was very useful to look at your solution just to understand the next steps.

#Use this code:
import datetime as dt
result_list =
for row in ask_posts:
created_at = row[6]
date = dt.datetime.strptime(created_at, β€œ%m/%d/%Y %H:%M”)
hour = dt.date.strftime(date,"%H")

row[6] = created_at

n = int(row[4])
temp_list = [created_at , n]
result_list.append(temp_list)

print(result_list[:5])

#then:

counts_by_hour ={}
comments_by_hour = {}
for row in result_list:
created_at = row[0]
date = dt.datetime.strptime(created_at, β€œ%m/%d/%Y %H:%M”)
hour = dt.date.strftime(date,"%H")
row[0] = created_at
if hour not in counts_by_hour:
counts_by_hour[hour] = 1
comments_by_hour[hour] = row[1]
else:
counts_by_hour[hour] += 1
comments_by_hour[hour] += row[1]
print(counts_by_hour)
print(’\n’)
print(comments_by_hour)