How I automated a non-profit with Python

How I automated a non-profit with Python

I volunteer in a non-profit called AIESEC. It is a nice platform for young people who want to develop their leadership and soft skills while having a positive impact on the world by volunteering or doing internships abroad.

Our organization works more or less as a modern company (for example, we have international, national and local executive boards). And of course, we have weekly meetings and reports on which we base our next steps to achieve goals. One of our KPIs (Key Performance Indicators) is the number of sign ups on our website which we categorize by referrals (like, Instagram, Facebook, friends and so on). Our marketing responsible had to go through the list of sign ups on Google Sheets, determine the necessary week and count the number of sign ups by referral to create a weekly report. It would take her 15-20 minutes and sometimes she would make a math mistake.

I already did a lot of automation of her report tools using Google Spreadsheets formulas (which seem some sort of magic to many of the people I know). But in this case, the basic functionality wasn’t enough thus I used Python to automate the weekly report creation.

In this article, I will describe how I did it to save people’s time and energy.

Sample Data

I will use two examples of data I had to work with. In the first spreadsheet, you will find the date a person signs up and the corresponding referral. In the second spreadsheet, you will find weekly reports.

The screenshot of the samples are below.

What Exactly I had to Do?

Before starting to code I had to think through the problem to fully understand what needs to be done:

  1. I have to create a service account to allow the Python script to perform actions on the sheets. A good tutorial on how to get your JSON file with the private key can be found here.
  2. I have to group the selected rows by referral and compute the size of each group.
  3. I have to transfer these numbers to the report spreadsheet where…
  4. …I have to create a new sheet for the current week with a pie plot.

Preparation

I used two external libraries to work with Google Sheets: gspread and gspread_pandas. The former allows us to interact with sheets: modify, create, delete, and so on, while the latter allows us to push pandas dataframes to sheets which we will need after reshaping the data with pandas.

Since I was planning to create other scripts using these libraries (and also to separate our code in different modules) I created a small config file that reads the JSON file to authenticate us in Google API.

import gspread
import gspread_pandas

# Authorize in both packages
gc = gspread.service_account("./credentials/google_secret.json") # gspread
c = gspread_pandas.conf.get_config("./credentials/") # gspread_pandas

The code is straightforward. In the case of gspread we create a Client object that we can use to interact with sheets, while gspread_pandas returns a string with the JSON file content which we will use to authorize the package in Google API.

Now we are ready to write the main module. First of all, let’s import the packages and set some basic variables.

# Import libraries
import pandas as pd
from datetime import datetime as dt
from gspread_pandas import Spread
from config import gc, c

# Return authorization configs
gc = gc # Authorization for gspread
c = c # Authorization for gspread_pandas

# Set today and date format
today = dt.today()
date_format = "%d/%m/%Y"

We will work the dates (like weeks) so we need the datetime built-in Python module. We also import variables from the config module that we will use for the authentication. And finally, we will need the information of the present day (to find out which was the previous week) and set the data format used in the sheet with referrals.

Downloading and Manipulating the Data

In this section, we will pull the data from the sign-up sheet and transform it to the format we will upload to the report sheet.

First of all, let’s prepare the data.

# Open the SU MKT worksheet
su_mkt = Spread("1p_7-lZ24dO2fBVSDFDJtoKW0--yvIrd2uAhs2ECdBXs", sheet="SU MKT", config=c)

# Convert marketing sign ups to a dataframe
su_mkt_df = su_mkt.sheet_to_df(index=0, header_rows=1)

# Rename columns from integers to meaningful names
su_mkt_df.columns = ["su_date", "referral"]

# Replace empty spaces with NaN
su_mkt_df.replace("", float("NaN"), inplace=True)

# Drop NaN values (or blank spaces at the end of the dataframe)
su_mkt_df.dropna(subset=["su_date"], inplace=True)

# Convert su dates to datetime objects
su_mkt_df["su_date"] = pd.to_datetime(su_mkt_df["su_date"], format=date_format)

In the above code, we open the sheet with sign-up dates and referrals, using the authentication we did in the config module. Next, we transform the table to a dataframe, where we rename the columns for easier identification and replace all blank spaces with NaN values to easily drop them from the dataframe. Finally, we transform dates in datetime objects using the format we defined in the beginning and create a new column with week numbers to ease the creation of the weekly report.

In the next code cell, we will handle some special cases and group the data by week.

New Year Special Case

There is a small problem which happens with the week number at the beginning of each year. Let me first show you the code with comments.

# In datetime the first week of a new year before the first Sunday is 00
# so we have to change this value to 52 or the last week of the previous year

# Example
# dt(2020, 12, 31).strftime("%U") -> 52
# dt(2021, 1, 1).strftime("%U") -> 00

su_mkt_df.loc[su_mkt_df["week"] == "00", "week"] = "52"

# Remove leading zeros from weeks and transform them to integers
su_mkt_df["week"] = su_mkt_df["week"].str.lstrip("0")

As you can see, the problem with strftime() is that it treats the first week of the year as week 00 so that if 31st December is the 52nd week, 1st January will always be week 00 till the first Sunday of the week. This split can occur, for example, on Wednesday which will make our report incorrect on New Year. Thus, what I did in the code is to transform the first week of the year to the last week of the previous year (so 00 becomes 52), and the first week of the year will start from 01.

After resolving this problem we have to transform them to integers, removing the leading zero beforehand (because, otherwise, pandas won’t be able to make them integers and throw an error.

Some other Special Cases and Data Grouping

Now, it’s time to group the data and convert it to a list of lists to ease data upload on Google Sheets.

We will also account for the New Year special case. As I said before, the strftime() function will return 0 for the first week of the year till Sunday but we know that this is the 52nd week. Our script will run on Sunday and we have to subset the week before (which goes from Sunday to Saturday. It is easily achieved by subtracting 1 from today’s week number, creating a boolean mask for this week and slicing the dataframe.

But what if today’s 1st January? today.strftime("%U") will return 0 and if subtract 1 we’ll get -1 week which does not exist. However, we know that the week before 0 (the first week of the year) is actually week 51, hence we check for this in the first if statement and correctly slice the dataframe.

What if today’s the first week? If subtract 1 we’ll get week 0 which does not exist in our dataframe because we changed it to 52. Hence, we need to check also this condition and slice the dataframe for week 52.

# Subset the previous week (the script runs on Sunday)
# So the previous week would be from Sunday till Saturday
# We also have to account for the first week of the year
if int(today.strftime("%U")) == 0:
    su_mkt_subset = su_mkt_df[su_mkt_df["week"] == 51] # If today's 0s week we have to take the 51 week (the second-last week of the year)
elif int(today.strftime("%U")) == 1:
    su_mkt_subset = su_mkt_df[su_mkt_df["week"] == 52]  # We have to take the 52nd week if today's is the first week of the year otherwise it will be 0
else:   
    su_mkt_subset = su_mkt_df[su_mkt_df["week"] == str(int(today.strftime("%U")) - 1)]

Finally, we will have to group the data by referral and count the number of their occurrences in the week. We will also transform the resulting Series to a list of lists because it’ll be easier to push it to Google Sheets.

# Group by referral and count
grouped = su_mkt_subset.groupby("referral")["su_date"].count()

# Transform Series to DataFrame
grouped = pd.DataFrame(grouped)

# Convert Series to list of lists
grouped = grouped.reset_index().values.tolist()

Pushing the Data to Weekly Report Sheet

The grouped variable is a list of lists:

[['Facebook', 3],
 ['Friends', 1],
 ['Instagram', 1],
 ['Other', 1],
 ['Search Engine', 3]]

Where each list contains the referral’s name and the number of sign-ups with that referral. All we need to do now is to push this list to the report sheet. We would also like to have a pie chart that automatically updates with the new data and also retain all the sheets to look up the results of a particular week in case we need them.

Hence, we need to duplicate a sheet, clear it from the old values and update it with the new ones.

First of all, we duplicate the last sheet, clear it and name it with today’s date.

# Weekly report sheet
sh = gc.open_by_key("1cjD9wTSDCxb3TS9nYHU6TLtcvOAR7flcg1okB7QQ6xI")

# Return the last sheet to duplicate
last_sheet = sh.get_worksheet(-1)
# Duplicate the sheet: name is the today’s date
new_sheet = last_sheet.duplicate(new_sheet_name = "{}".format(today.strftime(date_format)), insert_sheet_index=1)
 
# Clear the Count column
range_to_clear = new_sheet.range("B2:B16")
for cell in range_to_clear:
    cell.value = ''

new_sheet.update_cells(range_to_clear)

Now, we have to get the referrals’ names from the first column because we will compare them with the names from the grouped variable and in case of the coincidence, update the second column “Count” (in the report sheet) with the number from the grouped variable.

# Lists of lists of value in the range A2:A16 (only referral)
new_sheet_referrals = new_sheet.get("A2:A16")

This above variable (new_sheet_referrals) has this structure:

[['Class Presentation'],
 ['Facebook'],
 ['Friends'],
 ['Ads'],
 ['Other'],
 ['Website'],
 ['Email'],
 ['Flyer'],
 ['Search Engine'],
 ['Instagram']]

The last thing which remains is to loop through the list of lists in the grouped variable and through the list of lists of the new_sheet_referrals variable, compare the first values in each inner list and if they are the same, update the corresponding cell in the report sheet. Let’s check the code:

for l in grouped:
    referral = l[0]
    for ix, v in enumerate(new_sheet_values):
            if v[0] == l[0]:
                    new_sheet.update_cell(ix+2, 2, l[1]) 

First, we loop through the first list of lists (grouped) from which we extract the referral name (the first element of each inner list). Then we use the enumerate method which allows us to keep track of both the element’s index and the element itself.

We need the index, because we will use it to update the right row in the report sheet. So, if the referrals in both lists are the same, we take the cell which has the index of the referral from the report sheet which is equal to the referral from the grouped variable. Thus, we update the row which is equal to the index + 2 with the number of the current referral from the grouped variable. We have to add 2 to the index because enumerate starts from 0 while Google Sheets indices from 1 and we also have the header row in the report sheet.

Finally, you can schedule this script to run every week with, for example, crontab.

Conclusions

In this article, I showed how easy it can be to understand the repetitive patterns other people do and apply your Python knowledge to save their time with just a few lines of code that for them will seem some kind of wizardry.

2 Likes

Great job, Artur! And perfect way to apply Python to optimize real working tasks! :star_struck:

1 Like

Thank you, Elena! Really appreciate your words :smile:

1 Like