LIMITED TIME OFFER: 50% OFF OF PREMIUM WITH OUR ANNUAL PLAN (THAT'S $294 IN SAVINGS).
GET OFFER

Replacing value in list of lists after looping

Screen Link:
https://app.dataquest.io/m/351/cleaning-and-preparing-data-in-python/4/cleaning-the-nationality-and-gender-columns

I wrote the below code following the instructions, and the output was okay.

My Code:

for row in moma:
    nationality = row[2]
    gender = row[5]
    
    nationality = nationality.replace("(", "")
    nationality = nationality.replace(")", "")
    row[2] = nationality
    
    gender = gender.replace("(", "")
    gender = gender.replace(")", "")
    row[5] = gender
    
print(moma[:7])

But, I wanted to achieve the same result using a nested loop as follows:

for row in moma:
    for item in row:
        item = item.replace("(", "")
        item = item.replace(")", "")
        # print(item) # Upto this line the output is okay
        
print(moma) # But, at this line I get the usual list, not the updated list without parentheses I expected
    

What I expected to happen:
I expected an updated list without the parentheses.

What actually happened:
I got the usual list with the parentheses.

[['Dress MacLeod from Tartan Sets', 'Sarah Charlesworth', '(American)', '(1947)', '(2013)', '(Female)', '1986', 'Prints & Illustrated Books'], ['Duplicate of plate from folio 11 verso (supplementary suite, plate 4) from ARDICIA', 'Pablo Palazuelo', '(Spanish)', '(1916)', '(2007)', '(Male)', '1978', 'Prints & Illustrated Books'], ['Tailpiece (page 55) from SAGESSE', 'Maurice Denis', '(French)', '(1870)', '(1943)', '(Male)', '1889-1911', 'Prints & Illustrated Books'], ['Headpiece (page 129) from LIVRET DE FOLASTRIES, À JANOT PARISIEN', 'Aristide Maillol', '(French)', '(1861)', '(1944)', '(Male)', '1927-1940', 'Prints & Illustrated Books'], ['97 rue du Bac', 'Eugène Atget', '(French)', '(1857)', '(1927)', '(Male)', '1903', 'Photography'],

Thank you for taking the time to read this post.

You did not assign the updated value back into the list. All string methods in python return a new string. New meaning a new memory address which you can check with id() (good debugging tool when you study generators too).
You think it is item because you name it so, but it not necessarily is. Besides getting the value and type() right, think about whether it’s still the same object using the id() too. Such mistakes usually occur when dealing with mutable data structures (lists for a start, pandas dataframes too, and instances of user defined classes)

Thank you for replying. I hate to say it but I didn’t understand what you had meant. Could you please give me some examples on assigning the updated value back into the list.

When talking about the sameness of variables, we can begin by thinking of 3 things

  1. the object/value it is pointing to (eg. ‘hi’, 2 , 1.232, True, 4i+2j, datetime.date(2019, 12, 4))
  2. the type() of the object/value
  3. the id()` of the object/value (meaning its memory address)

Try to go through this first to understand python’s name binding mechanism, it explains the type() and id() i referred to. https://realpython.com/python-variables/#object-references.

When you do row[2] = nationality, it is assigning the nationality variable into the list because row represents the list and [2] calls list.__setitem__(2,nationality) to make the 3rd element of list point to the value of nationality. Fyi, when [] appears on the right side of an assignment, it is usually syntactic sugar for __getitem__, when [] appears on left side, it’s __setitem__. Any class (including those you create yourself) can implement these methods if you decide to allow use of [] to interact with them.

No such setting of list elements is happening in item = item.replace("(", ""). item was initially pointing to the elements in the list during the for item in row, after item.replace("(", ""), a new string is created at another memory address. It doesn’t matter that you assigned this back to item, since the item, you are using on the left hand side, is not the same item (in terms of memory address, not value) that was extracted from the list in for item in row.

Here’s an example. Note the changes in id(), which is critical to study where something is and if it’s still the same thing.

ls = ['good','morning']
print(id(ls))
print(id(ls[0]))
print(id(ls[1]))

for word in ls:
   replaced = word.replace('o','')
   print('word location':id(word))
   print('replaced location':id(replaced))

I did not assign back to word after replacing to make things clearer, replaced above is analogous to your item on the left hand side.

3 Likes

Hi @hanqi, thank you for your great effort. It was an in-depth post. I’ve understood the concept, you’ve described, more or less. However, I feel totally embarrassed to accept that I still don’t know how to solve the problem at hand, how to convert the concept into code to update the list values. What I want to know is the correct code to add to my existing code or to replace my existing code so that list values will be updated with the new values through looping.
Forgive my ignorance.

The key idea is you must assign back to the same place in row. To access that place you need a positional index for list (or key for dictionary). To get the position you can use enumerate().

for row in moma:
    for i,item in enumerate(row):
        item = item.replace("(", "")
        row[i] = item.replace(")", "")

for i, item unpacks the 2-element tuple generated by enumerate. You can read about tuple-unpacking/multiple-assignment if unfamiliar.

My opinion is it is confusing and bad practice to assign multiple transformations back to the same variable name. Bad because when you come back to the code, it is not immediately clear which stage of transformation the variable is at (this is a common problem working with pandas dataframes interactively in jupyter), unless you wrap the steps into a function, or in a cell, that runs together top to bottom. Advantage is it does save memory compared to making new variable item_no_left_parenthesis. You can consider chaining with

row[i] = item.replace()\
             .replace()

Nested list comprehensions are possible too,but not as easily editable in future as a for loop.
List comprehensions create new lists, so you have to assign back to moma to make moma point to the new contents. This is not an in-place operation like the previous example.

moma[:]=[[item.replace('(','').replace(')','') for item in row] 
                                               for row in moma]
1 Like

Hi @hanqi, thank you so very much. That’s what I wanted to know. Actually, I’m still at fundamental stage in Python, so haven’t gone that far yet. But, it’s good to know some advanced concept early.

Your replies were very helpful and educative. It will definitely help me in future. Through this discussion with you, I have learned some things. That’s a plus for me. Thank you once again for your time and effort.

In this lesson instructions i dont understand what the 300, 400 and 500 are in the brackets. Can someone explain to me what those numbers mean?

Thanks,

Erin

Hey Erin!

I believe these are just arbitrary index numbers to illustrate the changes made to the values in the Nationality column. Nothing you need to read into!

Best,
Dee

Hi @erin.mccool,

print(moma[300][2])
print(moma[400][2])
print(moma[500][2])

All those numbers are index numbers indicating specific datapoint or cell in the list of lists. Here, 300, 400, 500 indicate row positions and 2s indicate column position.

I hope that helps.

Regards,

Mahadi

In the explanation for this section it states:
nationalities = [’(American)’, ‘(Spanish)’, ‘(French)’]
for n in nationalities:
clean_open = n.replace("(","")
clean_both = clean_open.replace(")","")


If we had used n.replace() both times, we would have lost the result of the first operation and ended up with strings that still had the ( character.

But in the for loop example, nationality.replace is used both times.
for row in moma:
nationality = row[2]
nationality = nationality.replace("(","")
nationality = nationality.replace(")","")
row[2] = nationality

So is it okay to use the same variable both times :confused:

1 Like