Help drinking the kool-aid

I have been taking the Data Science using Python training for a few months and think it’s great.
My problem is I can’t seem to figure out how to really implement Python and it’s functionality in my daily work. I always end up using Excel, Tableau, and PostgreSQL.
Any suggestions on how to really switch over and take the plunge? I feel like, until I do, I will never really reach my potential.
Thanks!

3 Likes

Hi @marksegal!! Welcome to our community :slight_smile:

I’m kind of in a similar boat at work where I’m expected to primarily work with just MySQL, Excel, and Power BI. Over time, I’ve found several ways to incorporate the use of Python! Mostly, I use it for complex Data Filtering/Cleaning that would be too much of a pain to do in Excel. We store some of our data in a CRM, and the entries there are made manually (through forms) by people external to the company, so it’s often really messy.

A lot of this I do in Jupyter Notebook, because I’m very fond of the readability allowed by its use of Markdown formatting. Sometimes, I also run Python scripts from within the Power BI Query Editor which is where Data Modelling can be performed on a Dataset you read in. I’ve used this to do things like using the Pandas library to create a categorical column that’s based on very stringent conditions, or using the predict_proba method from the scikit-learn library on a trained model to acquire the probability of a certain event happening, and using the results to classify things like how urgently each item needs attention, etc.

I’ve also explored the use of Python for entirely non-Data related things, like automating the sending of personalized emails to individual persons, renaming a bunch of a files in a folder easily, etc! It helps to be in a frame of mind of constantly trying to think of a more efficient way to automate away something laborious and repetitive. Chances are often pretty high that 1) There’s some way Python is able to help you, and 2) Someone on the internet has tried performing the same tasks using Python before!

5 Likes

Thank you for the reply!

Do you have an example of something you were doing in Excel that you realized was much faster, easier in Python?

One of my challenges is that I don’t tend to work with huge data sets. If I have a table with 1 million rows, its odd. I am usually doing work with like 10K rows of data. Pretty small.

You are right though… I do think I just have to “force myself” to start pulling away from Excel and my comfort zone.

1 Like

Mostly it’s been stuff like making conditional columns that follow very strict and convoluted conditions, across multiple variables, i.e. “If value in column a is more than column b, but time was before 10.30pm, but the difference between columns d and e is at least 10%… etc”. Stuff that’s a giant headache in Excel, but which a beginner might be able to do in 15-25 lines of code on Python. Often, I also do preliminary data exploration in Python - which doesn’t have to be complex, I could be using stuff as simple as the .info() and .value_counts() methods - and then export it into a CSV which is much more malleable and easy to work with in Excel.

Also, especially stuff like regex, and things like counting whether a particular substring occurs in a column more than once, and if so how often, etc.

You’re on to something about having to force yourself out of comfort zone. It took a couple of weeks for me to start venturing away from Excel and back into Python, but it’s definitely been paying off!

Edit: Couple of more recent use-cases of Python at work for me:

  • Splitting a large PDF file into multiple different PDFs. Turns out the free version of Adobe Reader doesn’t have this as a default function!
  • Every 6 hours, we have this first table of outstanding issues that updates. What we’re supposed to do is manually highlight each item to copy it, and verify if it’s already in this other second table. I used Pandas to completely automate this - now I just copy the entire table (as opposed to each individual item), paste it into an excel sheet, and then run a Python script that reads in both this table as well as the master sheet, and then spits out information relevant to what items are new, and what items already exist in the master sheet but are marked as fixed (and so can be removed from the 1st table). One of my colleagues considered me a wizard for figuring out this way to do things even though it’s really just exceedingly basic pandas.
4 Likes

Thanks again! Much appreciated!

2 Likes

I hope this helps. I wrote a script that makes my report writing a lot faster.

https://github.com/Uchencho/Excel-Script-for-Mundane-Tasks

1 Like

Very cool! Thanks for sharing.

I actually was just handed a project that I think will finally be the one to get me to use python to solve the problem. I am looking forward to it and, if possible, I will share it as well.

Thanks again!

1 Like

From my own experience:

I had received a bunch of excel files with various information in a tabular form but lots of duplication and unnecessary info mixed in. You can use pandas to read them all either at once or one by one and then clean up and retrieve the necessary information. Then save them as csv (I like to make my code modular) to apply different processing when needed.

This was in a fast paced business environment and I had specifically talked to the IT people to get Python installed.

My advice is this: Use whatever you feel most comfortable to get the best result as fast as possible. BUT when the amount of time it takes to do it once is comparable to the time you it will take to use Python to automate a task you foresee repeating then go with Python. Think of it as building you bag of tools and experience that will increase future productivity.

I hope this helps!!!

2 Likes