Automating the Reading-in of Several Excel Sheets

Dear DQ Community,

Grateful for this platform and the help you provide.

I am trying to read-in multiple sheets from excel and instead of having a new line of code every time I read in a new sheet, I wanted to automate this process.

excel_file = ‘XYZ_sample_data_analysis.xlsx’

xyz_1996 = 0

list_years = [‘xyz_1996’, ‘xyz_1997’, ‘xyz_1998’, ‘xyz_1999’, ‘xyz_2000’, ‘xyz_2001’, ‘xyz_2002’, ‘xyz_2003’, ‘xyz_2004’, ‘xyz_2005’]
sheet_num = 0

for each in list_years:
print(each)
each = pd.read_excel(excel_file, sheet_name = sheet_num)
sheet_num += 1
print(each)

I know that the loop works because I print the results from the sheet within the loop but when I call any of the variables outside of the loop, I get that it is not defined.

‘xyz_1996’ is not defined.

I’m wondering why this is.

I renamed each variable in the list without the single quotes (that is xyz_1996 instead of ‘xyz_1996’) to see if it would make any difference but it did not help

Thank you for your help!

1 Like

Hi @jk5209,

You can store the data frame to a dictionary like this:

data[each] = pd.read_excel(excel_file, sheet_name = sheet_num)

This way, you can use each data frame using the dictionary['key'] syntax:

data['xyz_1996']

In this mission screen, we are doing the same thing:

https://app.dataquest.io/m/136/data-cleaning-walkthrough/4/reading-in-the-data

import pandas as pd
data_files = [
    "ap_2010.csv",
    "class_size.csv",
    "demographics.csv",
    "graduation.csv",
    "hs_directory.csv",
    "sat_results.csv"
]
data = {}
for f in data_files:
    d = pd.read_csv("schools/{0}".format(f))
    key_name = f.replace(".csv", "")
    data[key_name] = d

Best,
Sahil

1 Like

Thank you Sahil! It works. :smiley:

1 Like