Regular expression

Hello. The following code removes the () from the column names , but I don’t understand how the pattern works. Why do we have to include the []. Can someone please explain?

happiness2015.columns.str.replace(r'[()]', '')

Hey, Helen. Can you please edit your question to include the link to the screen this refers to?

Regarding your question, do you know what is the function of the square brackets?

Here’s the link - https://app.dataquest.io/m/347/working-with-missing-and-duplicate-data/3/correcting-data-cleaning-errors-that-result-in-missing-values

From my understanding, the pattern should be an exact match, otherwise it will be ignored. However, including the square brackets removes the round brackets even if there are other characters in between. This confused me as it wasn’t covered in the lesson.

You’re right! It wasn’t. This is a mistake on our end.

That’s part of the magic of square brackets in regular expressions. The pattern [<char1><char2><cha3>] (where < > are placeholders for characters) will match any occurence of char1, char2 or char3. So [()] will be a match whenever it encounters either ( or ).

You’ll have a chance to learn more about this later in the path.

I will mention, however, that it’s not necessary to know this to pass this screen. You can get by with more rudimentary techniques.

2 Likes

Thank you for taking the time to respond, Bruno. I just wanted to make sure I understood what was going on before I moved to the next thing. :smile:
Now I get it. Appreciate your input!

1 Like