How to fix "KeyError: 'Level Column_A not found'" - restructure dataframe using set_index & unstack

I’ve been trying to figure this out for over 3 months now…
I would like to divide one columns’ value into multiple columns by condition.

When I tried set_index & unstack, “KeyError: ‘Level Column_A not found’” came up.

This is the code I tired

import pandas as pd

dic = {'Question_ID': ["AD","AD","AD","HG","HG","JI","JI","JI","JI","OP","RT","ER","ER","ER","ER"],
    'Answer_ID': ['BB', 'BB', 'BB', 'GG', 'GG', 'UI', 'UI', 'UI', 'UI', 'LL', 'TT', 'RR', 'RR', 'RR', 'RR'],
    'Section_ID': ['X'] * 9 + ['Z'] * 6,
    'Person_Incharge': ['Y'] * 9 + ['Q'] * 6,}
df = pd.DataFrame(dic)
dfx = df.set_index(["Section_ID", "Person_Incharge"]).unstack(["Question_ID","Answer_ID"])

Output:

KeyError: 'Level Column_A not found'

My data below background:

  • This is question&answer data
  • Each Section_ID have 3 respondents
  • I would like to divide each “Question” and “Answer” into 3 separate columns in order to see the consensus between 3
  • Under “Section_ID”, each respondents’ questions and answers are unique
  • Other:
    Question’s data is equal to Answer’s data (Question=Answer)
    Each Section_ID&Interview_date has the same value repeated

====================================================
What I would like to do:

  • Create new columns (divide each “Question” and “Answer” into 3 separate columns, original Section_ID and Interview_date can remain the same)
  • Assign Question&Answer’s values into different columns
  • I would like Section_ID and Interview_date aligned the longst value;
  • Add blank if the columns(Question&Answer) doesn’t have any more values
    Thank you for reading!

Hello,

I am very bad at reading things, but I did try.

I am not sure about solving the unstack problem, but have you considered just making a boolean series out of your Question and Answers?

Though, I’m not sure if that’s what you want.

Trying.ipynb (30.4 KB)



Click here to view the jupyter notebook file in a new tab

Something like this:

df_split_questions = df.copy()
question_ids = ['AD', 'HG', 'JI', 'OP', 'RT', 'ER']

for id in question_ids:
    df_split_questions['Question_ID_' + id] = df['Question_ID']==id

df_split_answers = df_split_questions.copy()

answer_ids = ['BB', 'GG', 'UI', 'LL', 'RR']
for id in answer_ids:
    df_split_answers['Answer_ID_' + id] = df['Answer_ID']==id

df_split_answers.drop(['Question_ID', 'Answer_ID'], axis=1)

Hi,
Sorry my question was bad… & Thank you for the codes!
I created the dataframe that I would like to create↓↓

Below is the original dataframe that I have(actual dataframe is much longer than this):

Question Answer Section_ID Person_Incharge
AD BB X Y
AD BB X Y
AD BB X Y
HG GG X Y
HG GG X Y
JI UI X Y
JI UI X Y
JI UI X Y
JI UI X Y
OP LL Z Q
RT TT Z Q
ER RR Z Q
ER RR Z Q
ER RR Z Q
ER RR Z Q

And below is the dataframe I would like to restructure(actual dataframe is much longer than this):

Section_ID Person_Incharge Question_1 Answer_1 Question_2 Answer_2 Question_3 Answer_3
X Y AD BB HG GG JI UI
X Y AD BB HG GG JI UI
X Y AD BB “Blank” “Blank” JI UI
X Y “Blank” “Blank” “Blank” “Blank” JI UI
Z Q OP LL RT TT ER RR
Z Q “Blank” “Blank” “Blank” “Blank” ER RR
Z Q “Blank” “Blank” “Blank” “Blank” ER RR
Z Q “Blank” “Blank” “Blank” “Blank” ER RR
1 Like

I see.

Hmm…it seems some of the question and answer pairs are interrelated, thus the complexity.

The only thing I can think of is a mix of identifying how the question&answer pairs relate to each other (e.g. AD:BB are linked to HG:GG and JI:UI), group those pairs together, and then maybe pd.concat or pd.merge them.

I’m sure there are better ways out there, but I’m not that much of a data wizard.

I’ll give it a think and see if I can come up with the code.

Hopefully, you or someone else would’ve figured out the solution by then, because I’m quite slow.

1 Like

Yes… this is difficult, so I’ve been trying to figure out for months…

The actual dataframe I have has more columns such as time_stamp( those are interrelated with question and answer), so joining these strings with “:” would be difficult.

I think the part that made this extremely tricky are:

  • fixing this error"KeyError: ‘Level Column_A not found’"
  • adding “blank” to Question and Answer rows

I will wait for the advice and in the mean time I will continue searching for the answer on google :sweat_smile:
Thank you for your message!

1 Like

Hello,

Okay, I tried and it seems to work.

But, I’m not sure if it’s what you want, even though the output looks similar to what you requested.

Trying(1).ipynb (53.3 KB)

Click here to view the jupyter notebook file in a new tab

Feel free to ask any questions especially if I’m wrong or anything.

1 Like

Hi,
Oh wow, you are amazing! This is exactly what I wanted to achieve!!

Thank you very much for everything and the detailed descriptions that will help me understand what you’re writing. Next homework for me is trying to understand what you’re writing in this code!
I didn’t want to use the code that I can’t understand because I will not be my learnings. (When the code works, but I don’T know how.)

My learnings:
*I didn’t know group works for string table. I thought It was used to calculate numbers such as groupby(). sum()
*Everything you wrote in the code is learning for me

I changed one code to be tailored for my dataset. (My dataset is actually much longer…)

This “Question” column’s value could include the exactly same value (i.e. we have 2 “AD” in the “Question” )

In that case, it will be like this below
↓this part of your code

# we can either concat or merge depending on our use case
# we start by identifying which pairs the question id and its question number

question_1 = ['AD', 'OP','AD']
question_2 = ['HG', 'RT','DD']
question_3 = ['JI', 'ER','AI']

output returned error

KeyError: 'DD'

Thus, I created the unique key ("Question "+ “Section_ID” + unique count)and added as a new column and used it as groupby column like below,

# Create new column: Key = Question_ID + Section_ID + Cuont Unique value number(to be extra unique, just in case, there's duplicate combination of Question_ID + Section_ID combination)
lists = df.values.tolist() 
for row in lists:
    Question_ID = row[0]
    Section_ID = row[2]
    key = Question_ID + "-" + Section_ID
    row.append(key)

df = pd.DataFrame(lists, columns=['Question_ID','Answer_ID', 'Section_ID','Person_Incharge', 'Key'])

df['Key1'] = pd.factorize(df['Key'])[0]
df['Key']= df['Key'].map(str) + " : " + df['Key1'].map(str)
df = df.drop('Key1', axis=1)

Then, I also wanted to create code to automate listing elements that goes to each column. (slice every3rd element from the list)

# we start by identifying which pairs the question id and its question number
# in oder to automate, select every 3rd the unique value
unique_list = list(df["Key"].unique())
list_of_3 = [unique_list[i::3] for i in range(3)]
question_1 = list_of_3[0]
question_2 = list_of_3[1]
question_3 = list_of_3[2]
question_1

Sorry I wanted to add my selfish feature on top of your perfect codes haha

As a python beginner, my codes includes a lot of unnecessary steps…, but I will learn how to write a code that is as simple as I can.

This is the final version of the code now!
Kanae.practice.py (3.0 KB)

Thank you for solving this and helping me from sleepless nights!!

p.s. This data I’ve collecting is from AWS Sagemaker data outputs, so having complex structure dataframe is always my pain. This restructuring dataset technique you taught me is super helpful for my data analysis practice!

2 Likes

No worries. I’m glad I could help.

To be honest, without the full dataset, it’s quite difficult to create a solution that fits your use case. But I’m glad you were able to change my code to work with the actual dataset.

Sorry I wanted to add my selfish feature on top of your perfect codes haha

As a python beginner, my codes includes a lot of unnecessary steps…, but I will learn how to write a code that is as simple as I can.

Thank you for the compliments haha. I’m quite new to Python myself, started around 2 weeks ago, so I doubt my codes are good. But at least it works.

I wish you the best in your learning endeavors.

1 Like