Column reassignment in 4. Cleaning the Nationality and Gender Columns

Screen Link: https://app.dataquest.io/m/434/cleaning-and-preparing-data-in-python/4/cleaning-the-nationality-and-gender-columns

Can someone explain to me why the final row works as written here:

for row in moma:
    nationality = row[2]
    nationality = nationality.replace("(","")
    nationality = nationality.replace(")","")
    row[2] = nationality

I feel like this part was kind of glossed over in the module. How come row[2] = nationality works but nationality = row[2] doesn’t? Thanks.

hey @shawn

The first line inside for loop assigns the value of row[2] to the variable nationality.

Then the nationality variable undergoes data cleanup/ transformation.

In the last line, the cleaned value present with nationality variable is then re-assigned to row[2], i.e. back to the 3rd column of each row in “moma” dataset.

We can also do a direct cleanup/transformation of row[2], however here its done step by step using a variable.

Thank you Rucha!

Can you please give me an example of direct cleanup/transformation?

hey @shawn

direct assignment of row[2] would look something like this:

for row in moma:
    row[2] = row[2].replace("(","").replace(")","") 

And the results would look like this:

image

Hello @Rucha How does modifying the the Row updates the original Moma data set? Can you explain please.

hey @Rjx

When it comes to assigning a value, the code is read from right to left. For example, x = 5. This is read as the variable x equals 5, but is executed like this - 5 is the value of variable x

So, this part of the code updates the row[2], let’s call it Updated Row

Then, row[2] = Updated Row part of the code re-assigns the modified value - Updated Row is the new value for row[2].

This happens for every row in the moma dataset - for loop takes care of running this code for each and every row.

Let me know if this is still doesn’t help you.

Hey,
Thanks for your quick reply.

I understand how Variable Assignment and For loop works. I wanted to understand how the Moma dataset is getting updated when we only updated the Row variable.
In the below piece of code, we assigned nationality to a variable, updated it and pushed it back to row variable, but we never pushed the row variable back to moma variable but still it got updated.

for row in moma:
    nationality = row[2]
    nationality = nationality.replace("(",'')
    nationality = nationality.replace(")",'')
    row[2] = nationality

hey @Rjx

I suspected this while writing the previous post. Should have listened to my intuition. My bad!

The entire mechanism of the for loop is responsible for updating the moma dataset. Each time the loop is run, one row is taken out at a time, an element, or maybe the entire row undergoes transformation and is assigned back to the dataset from which it was taken.
This taking out each row and re-assigning happens via this code itself:

for each_row in moma: you don’t need an explicit code to reassign the row back to moma.

This is a very layman example, but maybe of some help to visualize the “for loop”.

A basket has 5 types of fruits, each type has 5 units packaged together, each 3rd fruit has a sticker on, something like this:

Fruit Type A - [A1, A2, A3(sticker), A4, A5]
....
Fruit Type E - [E1, E2, E3(sticker), E4, E5]

A person who is removing these stickers would:

  • take out one packet at a time
  • open it only at the third fruit position
  • remove out the sticker
  • re-seal the pack and
  • place the packet ----------?

the blank here is “back in the same basket”. The person is not using a new basket.

That’s how the for loop works here as well. Let me know if this still doesn’t help you. I won’t quit. I might redirect your query to some other Community Moderator :stuck_out_tongue: though!

Hi @Rucha, Thanks for the explanation. It was helpful.

1 Like