I just began the path of data analysis in Python and finished the fundamentals course (achieved 8 missions and almost one guided project). I have a good experience with excel and I used to use it in data analysis & visualization in my work and actually I can do a lot using Excel.
Till now I don’t find that Python can do more than Excel or even in easier ways? contrariwise, Excel can achieve tasks in a shorter time and simpler steps!! please guide me, I’m really confused because I believe that the vast community of data scientists that use codes in data analysis should have a point!
sorry for this post and hoping to find the answer
thank you a lot
That is an interesting question. Here are few points that come to my mind when I compare Excel and Python (here Python means vast range of libraries available for Data Scientists. Python is not just about data manipulation and visualization):
Reproducibility: whenever you are editing datasets, you are able to see exactly what steps are taken to make the changes in Python. Furthermore, you can use the same code to clean other datasets. In Excel, you will have to do the same tasks of editing multiple times.
Scalability: when you are working with millions of rows, Excel cannot even open big datasets. You can easily edit big datasets (here big means gigabytes of data; for terabytes of data we can still use Python with SQL integration). Moreover, those data manipulations are faster than it is in Excel.
Integration: When it comes to terabytes of data, we can use libraries that integrate SQL and other tools to Python. With just a basic knowledge of SQL, we can pull tons of data from databases and do the analyses with our convenient pandas library.
Data cleaning: I worked with Excel and I know data cleaning can become very cumbersome if you have a huge dataset with 100 of columns. It will take many hours to clean the data with Excel even if you are able to open gigabytes of data.
Machine Learning and other libraries: at this point, there is no need for comparison. You could theoretically implement a neural network (NN) model in Excel but it will take many days to build it. Furthermore, even if you build the NN model, you won’t be able to edit the model easily. I can give you one example which makes clear what we are comparing:
We have a 100-page book (physical paper book) which contains 20 columns and 50000 rows. Imagine we now try to get a simple linear regression model by using a simple calculator. First, it would take days to enter the data to a calculator. Second, we could have made mistakes along the way. Third, we did not do any data cleaning yet. We will have to manually remove or replace those missing values. This would be such a painstaking job. But with Excel, we can do regression analysis easily. Although data cleaning will be easier in Excel than on paper, Python brings it to another level. A similar comparison can be made between Excel and Python where Excel is like a manual analysis done on the paper for a small dataset. Python makes such analysis much convenient and you take full control of all the changes.
Additionally, there are tons of libraries in Python that you will become more familiar as you learn more. If you are doing some tasks repeatedly on your PC, then it can be automated with Python. For this reason I am enjoying learning Python. I hope you will also enjoy it!