Guided Project : Hacker News Project using pandas

Hello everyone :blush:, I applied pandas to the Hacker news project to make the analysis easier and more explanatory and also applied some feedback I got.

I am really grateful for the feedback I got from @chefpaul92 and @Johnsonk51502 :people_hugging: cheers

Hacker News Analysis.ipynb (78.9 KB)

1 Like

Hey @OlutokiJohn,

Looks great!

Formatting the tables is a nice touch which makes your project look presentable and visually appealing.

I’m not sure where you’re at in your data science journey and what kind of tools you’re aware of but I can recommend two things you can do to take your analysis further:

  1. Use Matplotlib to create some plots that illustrate some of the relevant insights you found in your analysis. Graphs are great because anyone can read them and understand what they mean.

  2. Find the best times to post any kind of post (show posts, ask posts etc.) and, if you decided to go forward with my first recommendation, you could make a nice plot of a distribution of the average hourly comments.

Those are recommendations but your analysis looks good so far!

Finally, just watch for spelling errors, I caught one or two so just be sure to catch those.

Good work and good luck!

1 Like

Thank you I’d try that.

Hi @OlutokiJohn!
Thanks for sharing your project. One of things I liked about the presentation of your project was the clever use of HTML throughout the markup cells. I’m definitely going to steal that for future use…

Completely agree with @Johnsonk51502, visualizations using Matplotlib can really enhance the presentation of your findings. I have found that Matplotlib can be intimidating, so consider starting with Seaborn, a library that utilizes Matplotlib, but keeps a lot of the formatting under-the-hood.

Example: In [42] provides information that is a lot easier to interpret in graphical form

import pandas as pd
import matplotlib as plt
import seaborn as sns

comments_by_hour = pd.Series({0:4, 1:6, 2:13, 3:4, 4:4, 5:4, 6:4, 7:3, 
                              8:6, 9:3, 10:11, 11:9, 12:11, 13:21, 14:24, 15:77,
                              16:34, 17:21, 18:30, 19:25, 20:37, 21:39, 22:11, 23:16})

ax = sns.lineplot(data = comments_by_hour, linewidth = 2.5)
ax.set(xlabel='Hour', ylabel='Comments');

image

Here are a few additional comments:

  1. For In [12 - 15], consider adding some text to the printout so that the reader can have context for the number without reading the code block.
    print(f'There are {len(hacker_news[ask_post])} ask posts in the dataset')
  2. Take care with your sig-figs.
  3. Proofread your markdown cells: since jupyter does not contain any text proofreading functionality, it is sometimes worthwhile to run your text through Word or GoogleDocs to catch typos
  4. Use descriptive variable names. In In [42] I see df, dff, and df2. It’s hard for me to know what each of them mean individually.

Thanks again for sharing your project. You are off to a great start, and you taught me a few things about formatting notebooks. Thanks!
JKE

2 Likes

Hi @OlutokiJohn ,
Great job on the project! Your analysis was easy to follow and well commented. Great job on the charts and tables. Great job defining the goal of the project in the introduction. Adding a title to the tables was helpful. My only suggestion would be to maybe add a title to the last chart.
Great job!

Oh thanks a lot @casandra.data.analys