Star Wars Survey - Cleaning and Mapping Checkbox Columns

#201-4 #201

My Code:

I checked the solution, but I am curious if there is other way to perform this task without typing out each column and assign them to a dictionary, and instead just use for loop.

for col in star_wars.columns[3:9]:
    for i in range(1,7):
        star_wars = star_wars.rename(columns = {
            col: "seen_{}".format(i)
        })

What I expected to happen:
I expect the code turn out to match each col with a new name seen_(i), as i will increase at an incremental of 1. Something like this:

`col_1` --> `seen_1`
`col_2` --> `seen_2`
...
`col_6` --> `seen_6`

What actually happened:
After I print the columns:

print(star_wars.columns[3:9])
Index(['seen_1', 'seen_1', 'seen_1', 'seen_1', 'seen_1', 'seen_1'], dtype='object')
1 Like

for i in range(1,7): is the source of the bug. You are actually changing the name of each column 6 times.

For example:

Outer loop (Iteration 1)
col = “Which of the following Star Wars films have you seen? Please select all that apply.”

Inner loop (iteration 1) :

i = 1

Column with the value of col is renamed into seen_1.

Inner loop (iteration 2) :

i = 2

Pandas will try to rename a column with the value col which is "Which of the following Star Wars films have you seen? Please select all that apply.", but because we’ve already renamed it into seen_1, the column name can’t be found thus it will not be renamed into seen_2.

Inner loop (iteration 3-7)

The same thing happens, the column name will be stuck with seen_1 because they can’t find the column name that’s the same as the one inside col ( "Which of the following Star Wars films have you seen? Please select all that apply.") anymore.

Outer loop (further iterations)

The same thing will happen to the other columns. They will successfully rename the column name as seen_1, but fail to rename it into something else.

To fix this, you can either use an index outside the loop and increment at the end of each loop:

index = 1
for col in star_wars_copy.columns[3:9]:
        star_wars_copy = star_wars_copy.rename(columns = {col: "seen_{}".format(index)})
        index += 1
star_wars_copy

Or use something like zip to get both number and column name:

seen_columns = star_wars.columns[3:9]
for idx, col in zip(range(len(seen_columns)), seen_columns):
     star_wars = star_wars.rename(columns={col: "seen_{}".format(idx) })  

Or dictionary comprehension:

# somewhat clean
dict = {col: "seen_{}".format(i) for col, i in zip(star_wars_copy.columns[3:9], range(1,7))}
star_wars_copy.rename(columns = dict, inplace=True)
# or make it an incomprehensible one-liner 
star_wars_copy.rename(columns = {col: "seen_{}".format(i) for col, i in zip(star_wars_copy.columns[3:9], range(1,7))}, inplace=True)

References:

zip

dictionary comprehension

2 Likes

This is so cool. Thank you very much @wanzulfikri

1 Like

No worries. Glad I could help.

1 Like