Practice Mode: Pandas Data Cleaning (regex = True)

Screen Link:

My question is based on the Practice Mode > Pandas Data Cleaning > 2. Cleaning Money Column.

My Code:

apps['price'] = apps['price'].str[1:].replace(',','.')

Proposed solution

apps['price'] = apps['price'].str[1:].replace(',', '.', regex=True)

What I expected to happen:
Assuming everything else is exactly like the proposed solution, I expected that regex=True would not be required, giving it is the default parameter of Series.str.replace()

What actually happened:

ValueError: could not convert string to float: '33,30'

I have spent some time reading about regex, but because of the error above, it is still not clear how or when I should include it.

I appreciate any tips and feedback.

many thanks :raised_hands:

Hi @boemer00:

This has been mentioned in these 3 github issues and looks like it may get rectified in a future version.

Just noting the potential for this confusion was brought up when we added the regex parameter, though it didn’t generate much discussion: #16808 (comment)
At the time I noted that changing this behavior would break back-compat (since the undocumented behavior that had been there since the beginning was literal replacement for 1-character strings and regex replacement for >1 character strings).
I’m totally on board with changing either the documentation or the behavior to be more consistent, but it definitely needs a deprecation cycle as suggested by @TomAugspurger. The behavior of .str.replace(’.’, ‘’) without regex specified to replace periods, rather than every character, has been constant since at least <=0.16.

1 Like