LIMITED TIME OFFER: 50% OFF OF PREMIUM WITH OUR ANNUAL PLAN (THAT'S $294 IN SAVINGS).
GET OFFER

Function for replacing parentheses not working like this?

Screen Link:
https://app.dataquest.io/c/62/m/351/cleaning-and-preparing-data-in-python/4/cleaning-the-nationality-and-gender-columns

I wanted to use a function like this in order to save some lines compared to the solution when following the instructions.

Is tried to functionalize the problem without success:

My Code:

def remove_parentheses(dataset, idx):
    for row in dataset:
        value = row[idx]
        value.replace("(", "")
        value.replace(")", "")
        row[idx] = value
    return dataset

moma = remove_parentheses(moma, 2)
moma = remove_parentheses(moma, 5)

Then I thought maybe even though row was reassigned the same might not have applied to dataset within the function and I came up with this short edit:

def remove_parentheses(dataset, idx):
    num_row = 0
    for row in dataset:
        value = dataset[num_row][idx]
        value.replace("(", "")
        value.replace(")", "")
        dataset[num_row][idx] = value
        num_row += 1
    return dataset

moma = remove_parentheses(moma, 2)
moma = remove_parentheses(moma, 5)

What’s wrong about my approach?

I do not believe there is anything wrong with your approach but rather a modification/assignment issue. Specifically, the lines that use str.replace() have a fatal flaw in their implementation: this function does not modify values in place but rather returns a copy of the original object after performing the replacement.

For example:

test = 'The answer to life, the universe, and everything is __.'
test.replace('__', '42')
print(test)

Output:
The answer to life, the universe, and everything is __.

With this in mind, I modified your “first version” appropriately and it worked great!

Let me know if this isn’t enough of a hint and we can work on it together some more.

EDIT:
Just as an additional coding tip, for your second version of the function, you could clean up your for loop and its associated variables like so:

def remove_parentheses(dataset, idx):
    for i, row in enumerate(dataset):
        value = dataset[i][idx]
        value.replace("(", "")
        value.replace(")", "")
        dataset[i][idx] = value
    return dataset

moma = remove_parentheses(moma, 2)
moma = remove_parentheses(moma, 5)

That enumerate() function comes in really handy for when you want to loop over an iterable as well as keeping a counter for each loop.

1 Like

Thank you for your reply and the tip about enumerate(), @mathmike314 !

It seems like it must’ve been too late for me to realize the reasons for my mistakes. In the end it was almost trivial :).
I believe you wanted to add enumerate() into my first suggestion, however. In the second one I left out the row parameter completely.

And of course your code is not working either, but this is due to the error you pointed out earlier.

EDIT:
Actually, it doesn’t make a lot of sense to use enumerate() in my first version, either, when I don’t want to iterate over the index. And in your suggestion it’s not fully necessary because row is omitted, though it helps by indexing the row.

1 Like

I don’t think enumerate() would work well with your first suggestion because you don’t need a counter there but it could work for your second one (I removed the num_row variable from your code because enumerate() does that work for us).

So, did you manage to figure out how to correct the code? Does my example below show you why it’s not working?

test = 'The answer to life, the universe, and everything is __.'
test.replace('__', '42')
print(test)

Output:
The answer to life, the universe, and everything is __.

As you can see, I replaced the underscores (__) in the variable test but when I try to print this variable afterwards, the string has not changed from its original value. That’s because this function does not update the variable in place; it returns a modified copy which means we need to assign it back to the variable if we want it to be updated.

Sure I did! As I said it became too late for realizing my errors :smiley:

For my second version enumerate() is nice for omitting my self-counted index value.
But I blanked out the row since it’s not needed

def remove_parentheses(dataset, idx):
    for i, _ in enumerate(dataset):
        value = dataset[i][idx]
        value = value.replace("(", "")
        value = value.replace(")", "")
        dataset[i][idx] = value
    return dataset

moma = remove_parentheses(moma, 2)
moma = remove_parentheses(moma, 5)
1 Like

While this would technically work, it doesn’t read very well since (as you say) you don’t need the row data anymore with this strategy. I think it would be better to use something like for i in range(len(dataset)) rather than for i, _ in enumerate(dataset).