Guided Project: Exploring Hacker News Posts: Finding the amount of Ask post comments by hour

Screen Link: https://app.dataquest.io/m/356/guided-project%3A-exploring-hacker-news-posts/5/finding-the-amount-of-ask-posts-and-comments-by-hour-created

Your Code: Enclose your code in 3 backticks

to dictionaries to contain counts by hour and comments by hour
import datetime as dt
counts_by_hour={}
comments_by_hour={}
date_format= "%m/%d/%Y  %H: %M"

for row in results_list:
    date_hour=row[0]
    comment=row[1]
    time=dt.datetime.strptime(date_hour, date_format).strftime("%H")
    
    if time in counts_by_hour:
        counts_by_hour[time]+=1
        comments_by_hour[time]+=comment
    else:
        counts_by_hour[time]=1
        comments_by_hour[time]=comment

What I expected to happen: I expected to see two dictionaries with specific counts in each

What actually happened:

ValueErrorTraceback (most recent call last)
<ipython-input-11-e57ca85ae1c5> in <module>()
     8     date_hour=row[0]
     9     comment=row[1]
---> 10     time=dt.datetime.strptime(date_hour, date_format).strftime("%H")
    11 
    12     if time in counts_by_hour:

/usr/lib/python3.4/_strptime.py in _strptime_datetime(cls, data_string, format)
   498     """Return a class cls instance based on the input string and the
   499     format string."""
--> 500     tt, fraction = _strptime(data_string, format)
   501     tzname, gmtoff = tt[-2:]
   502     args = tt[:6] + (fraction,)

/usr/lib/python3.4/_strptime.py in _strptime(data_string, format)
   335     if not found:
   336         raise ValueError("time data %r does not match format %r" %
--> 337                          (data_string, format))
   338     if len(data_string) != found.end():
   339         raise ValueError("unconverted data remains: %s" %

ValueError: time data '8/16/2016 9:55' does not match format '%m/%d/%Y  %H: %M'

Other details: Thanks in advance

1 Like

The one thing that popped out at me is that it seems like date_format has some extra spacing in it. Have you tried eliminating the extra spacing, as in "%m/%d/%Y %H:%M"?

2 Likes

Hi @april.g i am also stuck at this task.

It seems that the space changed nothing for me:

import datetime as dt

result_list = []

for row in ask_post: 
    created_at = row [6]
    num_com = int(row[4])

result_list.append(created_at)
result_list.append(num_com)
print(result_list[:1])


counts_by_hour = {}
comments_by_hour = {}
date_format = ("%m/%d/%Y  %H:%M")

for row in result_list: 
    created_at = row[0]
    num_com = row[1]
    time = dt.datetime.strptime(created_at, date_format)

When trying to creat a datetime.datetime object from this string, i get this error:

ValueErrorTraceback (most recent call last)
<ipython-input-37-bfa891fa8575> in <module>()
      19     created_at = row[0]
      20     num_com = row[1]
---> 21     time = dt.datetime.strptime(created_at, date_format)
     22 
     23 

/usr/lib/python3.4/_strptime.py in _strptime_datetime(cls, data_string, format)
    498     """Return a class cls instance based on the input string and the
    499     format string."""
--> 500     tt, fraction = _strptime(data_string, format)
    501     tzname, gmtoff = tt[-2:]
    502     args = tt[:6] + (fraction,)

/usr/lib/python3.4/_strptime.py in _strptime(data_string, format)
    335     if not found:
    336         raise ValueError("time data %r does not match format %r" %
--> 337                          (data_string, format))
    338     if len(data_string) != found.end():
    339         raise ValueError("unconverted data remains: %s" %

ValueError: time data '8' does not match format '%m/%d/%Y  %H:%M'

I guess that in this case, the database do not use a double digit format to indicate the > 10 months, and thus (%m) ( which is a double digit format) is causing the error.

Is my guess plausible ?
Starting for this guess, how can I modify the format to make it a double digit in the list of list ? Or is there a better way ?

Thank you for help :smiley:

For the date string, there still looks to be an extra space between the date and the time. However, the specific error you’re seeing (ValueError: time data '8' does not match format '%m/%d/%Y %H:%M') is likely related to the creation of result_list. Check out this post that explains in more detail how this error comes about and what you can do to resolve it.

2 Likes

@april.g Thank you, you are the best !

Thanks @april.g for guiding about the error.

For the benefit of other community members:

As we are asked to create list of lists,

The columns which we want to append to results_list should be appended in the following format:

example: results_list.append([column1, column2])

this should fix the issue.

2 Likes

Hey @april.g
I got stuck here

import datetime as dt
counts_by_hour = {}
comments_by_hour = {}
date =[]
date_format = '%m/%d/%Y %H:%M'
for row in result_list:
    hour = row[0]
    date_hour =dt.datetime.strptime(hour,date_format).time() 
    date_date =dt.datetime.strptime(hour,date_format).date()  
    time = date_hour.strftime("%H")
    comment=row[1]
    date.append(time)
    
#print(date[:20])                   
for x in date:    
    if x not in counts_by_hour:
        counts_by_hour[x] =1
        comments_by_hour[x]  = comment
    else:
        counts_by_hour[x]+=1
        comments_by_hour[x]+=comment
        
print(counts_by_hour)
print(comments_by_hour)

that is what i did and the result was:
{β€˜20’: 80, β€˜07’: 34, β€˜14’: 107, β€˜03’: 54, β€˜02’: 58, β€˜21’: 109, β€˜08’: 48, β€˜15’: 116, β€˜00’: 55, β€˜17’: 100, β€˜10’: 59, β€˜22’: 71, β€˜06’: 44, β€˜09’: 45, β€˜19’: 110, β€˜01’: 60, β€˜13’: 85, β€˜05’: 46, β€˜11’: 58, β€˜18’: 109, β€˜04’: 47, β€˜12’: 73, β€˜23’: 68, β€˜16’: 108}
{β€˜20’: β€˜22222222222222222222222222222222222222222222222222222222222222222222222222222222’, β€˜07’: β€˜2222222222222222222222222222222222’, β€˜14’: β€˜22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222’, β€˜03’: β€˜222222222222222222222222222222222222222222222222222222’, β€˜02’: β€˜2222222222222222222222222222222222222222222222222222222222’, β€˜21’: β€˜2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222’, β€˜08’: β€˜222222222222222222222222222222222222222222222222’, β€˜15’: β€˜22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222’, β€˜00’: β€˜2222222222222222222222222222222222222222222222222222222’, β€˜17’: β€˜2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222’, β€˜10’: β€˜22222222222222222222222222222222222222222222222222222222222’, β€˜22’: β€˜22222222222222222222222222222222222222222222222222222222222222222222222’, β€˜06’: β€˜22222222222222222222222222222222222222222222’, β€˜09’: β€˜222222222222222222222222222222222222222222222’, β€˜19’: β€˜22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222’, β€˜01’: β€˜222222222222222222222222222222222222222222222222222222222222’, β€˜13’: β€˜2222222222222222222222222222222222222222222222222222222222222222222222222222222222222’, β€˜05’: β€˜2222222222222222222222222222222222222222222222’, β€˜11’: β€˜2222222222222222222222222222222222222222222222222222222222’, β€˜18’: β€˜2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222’, β€˜04’: β€˜22222222222222222222222222222222222222222222222’, β€˜12’: β€˜2222222222222222222222222222222222222222222222222222222222222222222222222’, β€˜23’: β€˜22222222222222222222222222222222222222222222222222222222222222222222’, β€˜16’: β€˜222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222’}

and also this part is not clear for me

  • If the hour isn’t a key in counts_by_hour :
    • Create the key in counts_by_hour and set it equal to 1 .
    • Create the key in comments_by_hour and set it equal to the comment number.
  • If the hour is already a key in counts_by_hour :
    • Increment the value in counts_by_hour by 1 .
    • Increment the value in comments_by_hour by the comment number.

thanks in advance :smile:

Hi @wALEED.tawheed, welcome to the community! I edited your post above because the code was hard to read when it got copy/pasted (you can click the edit button on your post to see how I got the editor to do that, just adding triple backticks before and after. :slight_smile: ) .

I pasted the code into a copy of my project to have a look at what was going on. The counts_by_hour dictionary is okay, the values match what I have. The problem is with comments_by_hour. You have comment = row[1] within the first loop of your code, but nothing is done with it until the 2nd loop. The problem with doing it this way is that the 2nd loop is separate, and it will only use the last value of comment from the first loop. (That value seems to be pretty large, too, so there might be something fishy elsewhere, but I can’t tell from here.)

You can simplify your code by not not using the separate date list and keeping everything in 1 loop instead of 2 so we can use everything result_list has inside it.

It sounds like the part that’s tripping you up has to do with building the two dictionaries at the same time? I’m not sure what isn’t clear for you, so my explanation might not be helpful. Because we’re using the same key in both dictionaries at the same time, the if/else only needs to check whether or not the key exists in one of the dictionaries – in this case the instructions say counts_by_hour, but it could have easily been the other one. If in the loop through result_list the hour we extracted doesn’t exist, we add it to the dictionaries. If it does exist, then we make the change necessary to both dictionaries. The counts_by_hour dictionary is just counting how many times we get each hour, so we just use 1 to start, and increment by 1 if the key already exists. The comments_by_hour is a little different, in that instead of starting with 1, we start with the number of comments from that row of result_list. Then once the key exists, we continue adding comments to the corresponding keys.

If that doesn’t help, let me know which part is still confusing and I’ll see what I can do. :slight_smile:

2 Likes

Thanks a lot @april.g
but something is going fishy with that code
i followed your steps and i got some changes but guess what? comments no. is hilarious
i can’t even read it

the code:

import datetime as dt
counts_by_hour = {}
comments_by_hour = {}
comment_list=
date =
date_format = β€˜%m/%d/%Y %H:%M’
for row in result_list:
hour = row[0]
date_hour =dt.datetime.strptime(hour,date_format).time()
date_date =dt.datetime.strptime(hour,date_format).date()
time = date_hour.strftime("%H")
comment2=row[1]
comment_list.append(comment2)
date.append(time)
if time not in counts_by_hour:
counts_by_hour[time] =1
comments_by_hour[time] = (comment2)
else:
counts_by_hour[time]+=1
comments_by_hour[time]+=(comment2)
print(max(comment_list))
print(counts_by_hour)
print(comments_by_hour)

and the result is:

97
{β€˜13’: 85, β€˜01’: 60, β€˜12’: 73, β€˜23’: 68, β€˜16’: 108, β€˜04’: 47, β€˜08’: 48, β€˜07’: 34, β€˜06’: 44, β€˜00’: 55, β€˜10’: 59, β€˜15’: 116, β€˜03’: 54, β€˜02’: 58, β€˜20’: 80, β€˜14’: 107, β€˜19’: 110, β€˜18’: 109, β€˜11’: 58, β€˜09’: 45, β€˜22’: 71, β€˜05’: 46, β€˜21’: 109, β€˜17’: 100}
{β€˜13’: β€˜29177532554209216155234511044695117225264137914363126112663692491210214262138834831192638131222231826286613’, β€˜01’: β€˜334124433131922139311443765213421223142262353617282623812422193222069721262514’, β€˜12’: β€˜41711731035824123372521169116063225177912102514394273382122611194167152264588531121156213’, β€˜23’: β€˜145217122152181511271187657948696563141132949424104461461456265133351202151111282’, β€˜16’: β€˜177194140231091726822112513325229212251153996124537181211431021161163551521351111212334826911422312521234319449198325151548271691088012’, β€˜04’: β€˜3374131251218211125313622182242111221087214693261911121752’, β€˜08’: β€˜513083422911612414122278349112214523511429512334215111532’, β€˜07’: β€˜2311122361125320225112389863619212214621271’, β€˜06’: β€˜11239675222101221368105012142211437311254636722122522’, β€˜00’: β€˜1013611244332281392933421111214144311043338143531012427234321295215’, β€˜10’: β€˜131197112122133711254618349109904332211141014125061112444423131094311218258101’, β€˜15’: β€˜1697212521812509336322122631611232310114791032671623432831401281411189111161611225193414418185119471311328241283273149765745192211158591221622022605321233731’, β€˜03’: β€˜12251111821516914123284732321311512212041633344217296239241132415141’, β€˜02’: β€˜32230715184411118411265115116221312267101467251881631414201915424121218686’, β€˜20’: β€˜2374212241112183813264226406715144671242403477452421322374172335833191283295210612168221315532869’, β€˜14’: β€˜32223511722215111431110114111664285371221254141151213221239451112312811231111901162622215221425211669611850110211106629152211221323212925218’, β€˜19’: β€˜311527623433131851321532553142446181531229172228131064341241727336375253654111143437215427122211241242161823675111351151726131111812’, β€˜18’: β€˜2356156617220411112833221011614173312111966722531342117479732731011131111134738343312711121321751871224322232330101641122121921327225912304199’, β€˜11’: β€˜2285426653284141462166587114315128313623101326231231362111141241329’, β€˜09’: β€˜61560124232414731131211212211424221043554421281172’, β€˜22’: β€˜28194931113116724349202361213612161235524672133421316552363162912221916191622293410521’, β€˜05’: β€˜2922105311022361117169143171342216722513138560325102224642’, β€˜21’: β€˜4432032125712191713321010119110621232371614126232520217211951517241147113134102922175414321216106331503228181251153711310724543301423818’, β€˜17’: β€˜12731118220182151055592511262224512916231625412654152372319656135347512514322126825122611111014312242224362628213143452432515’}

:frowning: :frowning:

Wow, that’s a pretty crazy result there! :anguished:

When I run the code back in my copy of the project, my results don’t look anything like that (it ran perfectly!). I’m thinking the problem with the large numbers might be in result_list. You could inspect the results list by running result_list[:5] in a separate cell and seeing what the output looks like. You should get something like this:

[['8/16/2016 9:55', 6],
 ['11/22/2015 13:43', 29],
 ['5/2/2016 10:14', 1],
 ['8/2/2016 14:20', 3],
 ['10/15/2015 16:38', 17]]

My guess is that the 2nd number in each list (the comments) is something different.

If you upload a copy of your .ipynb file, I’ll happily take a look at it and see if I can help you troubleshoot.

1 Like

Guided Project_ Exploring Hacker News Posts (2).tar (3.0 MB)

I uploaded it
but when i ran the result_list [:20]
that’s what i got

[[β€˜8/16/2016 9:55’, β€˜6’], [β€˜11/22/2015 13:43’, β€˜29’], [β€˜5/2/2016 10:14’, β€˜1’], [β€˜8/2/2016 14:20’, β€˜3’], [β€˜10/15/2015 16:38’, β€˜17’], [β€˜9/26/2015 23:23’, β€˜1’], [β€˜4/22/2016 12:24’, β€˜4’], [β€˜11/16/2015 9:22’, β€˜1’], [β€˜2/24/2016 17:57’, β€˜1’], [β€˜6/4/2016 17:17’, β€˜2’], [β€˜9/19/2015 17:04’, β€˜7’], [β€˜9/22/2015 13:16’, β€˜1’], [β€˜6/21/2016 15:45’, β€˜1’], [β€˜1/13/2016 21:17’, β€˜4’], [β€˜10/4/2015 21:27’, β€˜4’], [β€˜1/25/2016 20:27’, β€˜2’], [β€˜10/27/2015 2:47’, β€˜3’], [β€˜1/19/2016 12:01’, β€˜1’], [β€˜3/22/2016 2:05’, β€˜22’], [β€˜9/8/2015 14:04’, β€˜2’]]

I see the problem. When you created result_list, the comments weren’t changed to integers.

    comments = row[4]

What ends up happening is that when it adds the comments, it’s doing string concatenation. So if for hour 20 you had the comment numbers β€˜1’, β€˜15’, and β€˜26’, it will just string them all together as β€˜11526’.

1 Like

thanks a lot
I just discover it at the moment also
really thanks a lot
i just resolved it by those three letters int :slight_smile:
if time not in counts_by_hour:
counts_by_hour[time] =1
comments_by_hour[time] = int(comment2)
else:
counts_by_hour[time]+=1
comments_by_hour[time]+=int(comment2)

this website helped me also : http://www.pythontutor.com/visualize.html#mode=edit
do you know website do the same function but more powerful.

thanks again for your support

1 Like

hi @april.g
i cant figure out why this keeps giving me error message.
please help.
thanks.
Basics.py (1.9 KB)

Hi @akheugiomho. Have a look at the date format.

edited, still nothing.
Basics (2).py (1.9 KB)

Could you instead upload your .ipynb file?

Actually I think I spotted it. You’re using date as your iteration variable but referencing row, so it’s pulling up the last row from a previous loop.

thank you soooooo much. it worked

Hi guys,

I am still confused about when we use strptime vs strftime especially the last one I am feeling it is opposite of strptime can someone please help me to clarify this issue?