Exploring Hacker News

Please let me know how I can improve the code for the project

https://app.dataquest.io/c/62/m/356/guided-project%3A-exploring-hacker-news-posts/5/finding-the-number-of-ask-posts-and-comments-by-hour-created
[Exploring Hacker News Post (1).ipynb (15.9 KB)

[Exploring Hacker News Post (1).ipynb (15.9 KB)

1 Like

Hi @gaurav_taskar!
Thanks for sharing your project here! This is a great way to get helpful feedback on your project, and I have even learned skills outside the scope of what DataQuest teaches as a result!

First, I want to compliment your efficiency in working through the project. It seems like you cover a lot of ground in a relatively short document.

It also looks like you have the right approach to all the answers in your project. Most of the comments I have are about readability. There’s a nice quote about code readability, but I forget the exact wording, so here’s a paraphrased/butchered version:

Write your code so that someone can understand it 12 months from now, because that person will probably be you.

You can click on the collapsed sections below to look at the individual recommendations I have.

Use Python f-strings to simplify string formatting

@Elena_Kosourova introduced me to Python f-Strings when I shared my first project. They are a powerful way to simplify your string formatting so that you do not have to define ‘template’ strings for each output.

For example:

template = "Average comments on ask posts are "
print(template + str(avg_ask_comments))

Can be reexpressed as:

print(f'Average comments on ask posts are {avg_ask_comments}') 
Add in-block comments and spacing to clarify individual code operations

When I took a look at your code, I was intimidated by the larger blocks of code that I saw. Take In[13], for example:

result_show_list = []
for post in show_posts:
    created_at = post[6]
    num_points = int(post[3])
    result_show_list.append([created_at, num_points])

print(result_show_list[0:5])

count_show_posts = {}
points_show_posts = {}
date_format_show = '%m/%d/%Y %H:%M'
for row in result_show_list:
    date = row[0]
    points = row[1]
    date_final = dt.datetime.strptime(date, date_format_show)
    time = date_final.strftime('%H')
    if time in count_show_posts:
        count_show_posts[time] += 1
        points_show_posts[time] += points
    else:
        count_show_posts[time] = 1
        points_show_posts[time] = points

points_show_posts
count_show_posts

I had to walk through your code line-by-line to understand what each block of code was doing, which made it more difficult to review how you approached the problem. Adding a few comments and visually breaking up code into functional chunks could help both you (and future co-workers) understand your reasoning. For example, here is how I would approach documenting your In[13]

# Collect and print a list containing the number of points for share posts by time-of-day posted.
result_show_list = []
for post in show_posts:
    created_at = post[6]
    num_points = int(post[3])
    result_show_list.append([created_at, num_points])
print(result_show_list[0:5])

# Gather time and point data and accumulate the values to the appropriate list.
count_show_posts = {}
points_show_posts = {}
date_format_show = '%m/%d/%Y %H:%M'
for row in result_show_list:
    date = row[0]
    points = row[1]
    date_final = dt.datetime.strptime(date, date_format_show)
    time = date_final.strftime('%H')
    
    # Check if the current time exists in our list. 
    #     Create it if necessary, otherwise, add the new value to the existing list.
    if time in count_show_posts:
        count_show_posts[time] += 1
        points_show_posts[time] += points
    else:
        count_show_posts[time] = 1
        points_show_posts[time] = points

# Display results
points_show_posts
count_show_posts
Vectorize similar assignment statements

This is a small comment, but in some cases it can make your code easier to read.
When you have very similar assignment statements right next to each other:

 date = row[0]
comment = row[1]

You can vectorize the code to be one line:

[date, comment] = [row[0], row[1]]
Include additional markup cells to help with project flow

As an outside reader, it can be challenging to follow someone’s thinking when only looking at their code. Using markup cells is a great way to help people step through your project without having to walk through code line-by-line. In particular, I am thinking about how you list the primary questions of the project in the first markup cell but then do not call out which of your cells answer which question.

I want to stress that you did a good job with this project. As far as I can see, you thought through the problems correctly and got the right answers. I hope that these comments are not too overwhelming. If you have any questions, I would be happy to talk more!

1 Like

Thank you so much for your reply. I highly appreciate your feedback and its very helpful in the process of\ enhancing my coding and notebook presentation skills.

1 Like