Why aren't y'all using a context manager in this case?

https://app.dataquest.io/c/48/m/264/pipeline-tasks/8/counting-unique-request-types

Why are the files being left open in the solution code? Why not use a context manager?

For example:

with open('example_log.txt') as log:
    parsed = parse_log(log)
    with open('temporary.csv', 'r+') as file:
        csv_file = build_csv(parsed,
                             file,
                             header=['ip', 'time_local', 'request_type',
                                     'request_path', 'status', 'bytes_sent', 
                                     'http_referrer', 'http_user_agent'
                                    ]
                            )
        
        contents = csv_file.readlines()
print(contents[:5])

Is there a drawback do doing it this way? Is it better?

I didn’t look at the details, but it’s extremely likely that what you propose is a better option.

As to why we’re not doing it that way, I would say this is a consequence of the content mutating. We now teach about closing files and context managers, but we haven’t always done that.

1 Like

It’s ironic that the data engineer path is all about optimizing data pipelines/footprints but leaves out context managers…

Having just completed the data scientist path a few months ago and the data engineer path yesterday, I’m surprised that what’s taught in the data scientist path isn’t congruent across the whole Dataquest platform.

Frankly, the Data Engineer path feels unpolished and unfinished. It feels like it was rushed to be made available for the sake of appearing as a competitive MOOC…

1 Like