Creating report from excel data

This is a data i am working on


I intend creating this kind of report from it
i have tried using pivot in pandas framework but i am not working out i am thinkiing the columns should have another one called years so i could use that sort it rather done it being sheet names

You can read each sheet into a dataframe then concatenate them into a single dataframe.

That poses another problem because each sheet represents data for a quarter in a year so bringing them together wont give accurate data.This is how the final report should look like

I would personally manipulate the dataframe so that each row would represent a year and quarter. Making it look like the following:

How would you manipulate the dataframe please?

You can iterate through each of the sheets then concatenate them to the first as follows:

import pandas as pd

#change path to reflect where your file is
path = "http://bit.ly/ekkpodogui"
excel_file = pd.ExcelFile(path)
sheets = excel_file.sheet_names
first_sheet = pd.read_excel(path, sheet_name=0)
first_sheet["Year"] = sheets[0].split()[0]
first_sheet["Quarter"] = sheets[0].split()[1]

#loops through the excel files sheets and concatenates to the first sheet
for sheet in range(1, len(sheets)):
    next_sheet = pd.read_excel(path, sheet_name=sheet)
    next_sheet["Year"] = sheets[sheet].split()[0]
    next_sheet["Quarter"] = sheets[sheet].split()[1]
    #print(first_sheet, next_sheet)
    
    first_sheet = pd.concat([first_sheet, next_sheet], ignore_index=True)

That should give you a dataframe like so:

2 Likes

Man it worked perfect i am so grateful


The rest of the columns can be seen when you scroll down.
Now what i am not getting is how to refrence the dataframe to create a pivot table i see this line
~print(first_sheet, next_sheet)~
gives me the data but how to reference it and use in pivot_table

That line was there to make sure it loop was doing what I wanted it to do, and in case you were interested as well. You can remove it if you want.

You can use df.pivot_table() to do that. What are you trying to pivot?

Am trying to use the restructured data to create the report i showed up there
This is what i tried doing
df = (first_sheet, next_sheet)
table = pd.pivot_table(df, index=[‘ACCOUNT’], columns=[‘Year’, ‘Quarter’], values=[‘PAST_DUE_DAYS’])
but i get this error message

Are you trying to make a plot with the restructured data? What is the end goal of the report?

No i am no creating a plot from it the end goal is to have a report which is easier to look at and see differences like this report

In that case, we’ll need to pivot the dataframe to get the needed output. You can use the following code:

first_sheet.pivot_table(index="Account Name", 
                                  columns=["Year", "Quarter"], 
                                  values=["Past Due Days"])

That should give you the following dataframe:

image

I have tried this and i keep getting this error

i saw what the problem was by changing these
first_sheet = pd.read_excel(path, sheet_name=0, header=2, skiprows=1)
and adding this
first_sheet.columns = [‘sn’,‘account’, ‘account_name’, ‘past_due_days’, ‘account_segment’]
first_sheet.pivot_table(index=“account”, columns=[“Year”, “Quarter”], values=[“past_due_days”])
now it only prints just one ie Q1 i want all printed out

I think what is happening is that there is only 1 quarter and year in the columbs. Use first_sheet[[“Year”, “Quarter”]].unique() to see what the different values you have in those columns. If there is only one year and quarter recreate the dataframe to have them all in it.

The fundamental problem i believe is with the column names let me share the excel with you so you run it and uncomment print(first_sheet, next_sheet) in the loop and look at the data
kindly get the data from here


you will get what am talking about it as some columns which is not needed like unamed, etc

Hi @jrfv88
Did you try reproducing the problem with the data i sent you?

I was able to see the problem. The first row of the file was’t the header, so you can bypass that by using the skiprows function to skip the first rows using pd.read_excel("filename", skiprows=1). There was also the issue that the next row after the header was empty you can chain the .dropna() method to drop that row as well. Here is the the modification of the original code to do the tasks listed above:

import pandas as pd

#change path to reflect where your file is
path = "loans.xlsx"
excel_file = pd.ExcelFile(path)
sheets = excel_file.sheet_names
#add the skiprows argument to skip the first row of the excel file.
#dropna drops the next column that doesn't contain any values
first_sheet = pd.read_excel(path, sheet_name=0, skiprows=1).dropna()
first_sheet["Year"] = sheets[0].split()[0]
first_sheet["Quarter"] = sheets[0].split()[1]

#loops through the excel files sheets and concatenates to the first sheet
for sheet in range(1, len(sheets)):
    next_sheet = pd.read_excel(path, sheet_name=sheet, skiprows=1).dropna()
    next_sheet["Year"] = sheets[sheet].split()[0]
    next_sheet["Quarter"] = sheets[sheet].split()[1]
    #print(first_sheet, next_sheet)
    
    first_sheet = pd.concat([first_sheet, next_sheet], ignore_index=True)

#pivoted is the dataframe that contains the account names by Year and Quarter
#to show the past due days.
pivoted = first_sheet.pivot_table(index="ACCOUNT NAME", columns=["Year", "Quarter"],
                        values="PAST DUE DAYS")

Here is what the end result will look like:

1 Like

Hi @jrfv88 i need your help ones more. this is my input data frame


Based on the value of “ACCOUNT SEGMENT” either term loans or overdraft i want to count movements from 10 to, 20, 30,40 or 50 per quarter and then number of 10s in each quarter. number of 20s in each quarter only. This is the intended output i am looking for NB image in next post

Intended output i am looking for