Guided Project: Hacker News Posts

Hello, Dataquest community! I just finished my second project, and am looking for some constructive feedback for improvement.

This is my fourth week on Dataquest, and am having lots of fun learning to code around data! I started with absolutely no knowledge on coding, but have already learned so much in that short period of time. Dataquest is an exceptional platform with an amazing community!

Feedback on my use of markdowns, code comments, and overall structure would be greatly appreciated!

I can’t get nbviewer to work for the life of me, and always get a 404: Not Found error, so I’ll link my Github instead.

Many thanks,
Andy

Exploring+Hacker+News+Posts.ipynb (24.5 KB)

Click here to view the jupyter notebook file in a new tab

5 Likes

@Andy nice job on this. I just finished the project myself and did the bare minimum but looking to expand on it and make it a nice portfolio-worthy project. This is a nice example of an expanded project. Gives me some food for thought.

One suggestion… I noticed you did a lot of formatting around ‘section headers’ in your output like ‘Posts per Hour’ and ‘Comments per Hour.’ How about creating a function that would apply a ‘standardized formatting’ around any of those output section headers? That way you just call the function with the string you want to use. Just a suggestion to minimize some code.

Otherwise really nice example and thanks for sharing!

2 Likes

@chris_is_working, thanks for the reply!

I really like the idea of creating a function for all the output headers. Something like that didn’t even come to mind while I spent all those extra minutes manually formatting everything, and definitely would have been helpful.

I went ahead and took a shot at it, and came up with these two variations of the function:

def ez_format(string):
    s = ""
    
    for _ in range(len(string) + 2):
        s = s + '-'
    
    s = s + '\n' + '|{}|' + '\n'
    
    for _ in range(len(string) + 2):
        s = s + '-'
        
    s = s.format(string)
    
    return s
def ez_format(string):
    s = ""
    
    for l in range(len(string) * 2 + 5):
        flag = len(string) + 2
        
        if l < flag:
            s = s + '-'
    
        elif l == flag:
            s = s + "\n" + "|{}|" + "\n"
            
        else:
            s = s + '-'

    s = s.format(string)
    
    return s

Thanks for the suggestion!

1 Like

Hi @Andy

Some additional thoughts.

If you are really interested in formatting string output, custom functions might get the job done. However, I find this in general a rather cumbersome approach, especially due to the fact that the functions often don’t translate nicely to different notebooks or Python scripts. One option is to use the Markdown cells in notebooks for the heavy lifting and avoid caring to much about the actual output formatting of the returns from code cells. A lot of people tend to take this approach.

Alternatively, starting with Python version 3.6, so called f-strings are an alternative way of formatting string output and they make things a lot easier to handle. So one could also get more aquainted with them. Additional info: Realpython: f-strings or PEP498

Caveat: DQ currently runs Python 3.4.3, so no f’strings in the online interface at the moment.

Best
htw

4 Likes

@htw thanks for the comment on f-strings. Did not know about those but the RealPython article you linked explains it well and it looks like a more elegant way to accomplish string formatting. Always something new to learn!

By the way, would you have an example notebook that uses the Markdown cells to help with structuring explanations so that we can avoid unnecessary strings and comments within the code itself? Curious if you just mean simple sentences to describe the output or more like headings. Would be nice to see a really good example of the style you’re referring to.

@chris_is_working

This is straying somewhat away from the original post, but my thoughts below.

Maybe I should have phrased this somewhat more cautiously. From my experience, a lot of people don’t care much about formatting code cell output at all when using notebooks. And I guess there are a good reasons for that. If your are sharing your notebook with a fellow researcher or data scientist they are mostly interested in what is going on. In this sense, a) pretty printing of dataframes and other Panda objects in combination with good code (this includes overall readability and documentation) already helps a lot and b) intricate string formatting doesn’t add a lot of value as long as you are not printing total gibberish. On the other hand, if you are sharing the notebook with colleagues or other people, who come from different fields, they are probably more interested in graphical output or explanations in the Markdown cells. Again, the added value of nicely formatted string output is in my opinion not so great. This being said, I personally worry primarily about clean, nicely documented code in combination with helpful markdown cells when revising notebooks. Additionally, I tend to take issue with badly presented and labelled visuals.

Further, when you look at some notebooks even the Markdown formatting is mostly basic (Different font sizes, lists, etc). Again, it is a question of how much time you want to invest here. However, in principle there are options. Markdown supports tables and if you really want to go crazy, you could install a notebook extension to use python variables in Markdown cells.

Best
htw

2 Likes