Blue Week Special Offer | Brighten your week!

High Level vs Low Level language concepts

In the 1st page of “Introduction to Numpy”, there is a description that explains Numpy: " Python is a high-level language, which means you don’t have to allocate memory manually. With low-level languages, you have to define memory allocation and processing, which gives you more control over performance, but it also slows down your programming. NumPy gives you the best of both worlds: processing performance without all the allocation."

I think there are so many computer science concepts crammed into this single paragraph, but I don’t have a firm grasp of what this actually means since I don’t have a computer science background. What is memory allocation? What does it mean to allocate memory manually? What is memory processing and why does defining allocation and processing give you more control over performance? What is control over performance anyway? …

The above paragraph alone led to so many questions that I couldn’t answer. Could anyone give me some advice on whether specifically understanding this part (difference between high and low level language) is important, and if it is, what sources can I refer to in order to completely understand what that paragraph means?

Python suddenly becomes really difficult once you try to study the computer science behind it, such as this part. I would appreciate any tips on understanding these kinds of “computer” concepts that don’t have anything to do with the coding itself.

It is not important to learn this at this stage, and possibly not ever depending on your learning and career goals. Go ahead with the lesson and you’ll do fine.

I may provide an answer to the technical part later, if no one else does in the meantime.

Would it be necessary to know it if one is aiming to become a data analyst or data scientist?

Most likely the answer is no. It’s hard to give an absolute and narrow answer because there are exceptions and many kinds of projects and tasks can fall under the umbrella of those roles, but, to repeat myself: most likely the answer is no.

Libraries like NumPy exist in part so that data folks don’t have to worry about such things.

Thank you very much for the advice :slight_smile: I’ll focus on the Dataquest exercises for now.

You don’t really need to understand it but since you’re (naturally)left wondering how it makes a difference:
Every character on this page (and every piece of data you process with Python ) has to be stored temporarily in something called RAM (you can look it up). With low-level languages like C, you have to decide, up front, how much memory you are going to reserve for data you are reading/processing. If it turns out the text/data is larger than what you reserved/allocated, you ask the system to reserve a bigger space, copy your existing data to the new space, read/add in your new data, then delete the old. This is very tedious to program. Python (and NumPy) just do it for you in the background without you having to specify all the steps. But there are times when this is less efficient. If we wanted to capitalize every instance of the word ‘data’ on this page, Python would make a copy of everything as it was changing d to D. A low-level C program however, could just go through the text changing d’s to D in-place. This uses less space in RAM and is faster, but quickly complicates whatever processing you are programming because more than one C variable could be using that same memory. Python and it’s libraries typically use C in the background to speed up these processor-intensive operations but protect you from such common low-level C problems. As a former C programmer, I was delighted to discover how Python simplified almost everything. Yes, you give up some processing speed, but it frees you up to just think about the real problem you are trying to solve with the data you have, rather than what’s going on at the machine level.