Str.replace() VS re.sub()

Screen Link:

My Code:

def normalize_values(string):
    string = string.replace("[^\w\s]", "")
#     try:
#         string = int(string)
#     except:
#         string = 0
    return string

jeopardy["clean_value"] = jeopardy["Value"].apply(normalize_values)

Can someone tell me why jeopardy["clean_value"] is the same as jeopardy["Value"] given my code above, but I get the intended result of stripping the $ from the jeopardy["Value"] column when I use re.sub("[^\w\s]", "", string) instead? Thank you!!

It’s because str.replace() doesn’t accept regular expressions. You need to use re.sub().

Hello @spi

In this case, string is of str normal data type in python and this method only accepts a normal string and NOT regular expression.

Good thing is that Pandas comes with .str.replace method for Series that you can use a regular expression to replace patterns with a value that you need.

jeopardy["clean_value"] = jeopardy["Value"].str.replace("[^\w\s]", "", regex=True)
1 Like