Screen Link:
My Code:
laptops['weight'] = laptops['weight'].str.replace('kg','').str.replace('kgs','').astype(float)
laptops.rename({'weight':'weight_kg'}, axis = 1, inplace = True)
laptops.to_csv('laptops_cleaned.csv', index = False)
What actually happened:
I got an error message
ValueError: could not convert string to float: '4s'
I noticed that the proposed solution had a different order for st.replace(). It starts by replacing ‘kgs’ and then ‘kg’. I tried running the solution (obviously it worked fine), so is this what I got wrong or there is something else that I did not notice.
laptops["weight"] = laptops["weight"].str.replace("kgs","").str.replace("kg","").astype(float)
thanks 
4 Likes
Hi @boemer00,
Yes, the order from left to right matters here.
In your case, when you first replace kg
with a white space, of kgs
in your column will remain only s
. So, when you use your second replace()
, Python just can’t find such cases, because they don’t exist anymore. And then later, obviously, the values containing this remaining s
cannot be converted to float.
That’s why you have to use or the succession suggested in the solution (first kgs
, then kg
), or if you want to use kg
first, then the second replace()
should replace the remaining s
with white spaces:
laptops['weight'] = laptops['weight'].str.replace('kg','').str.replace('s','').astype(float)
3 Likes
Got it! Thank you so much 
1 Like
is there a way to place two things in one replace statement?
for ex. (pseudocode):
laptops['weight'].str.replace('kg',''; "s","").str.replace('s','').astype(float)
or
laptops['weight'].str.replace('kg','' and "s","").str.replace('s','').astype(float)
or
laptops['weight'].str.replace('kg','' **something else here** "s","").str.replace('s','').astype(float)
1 Like
Hi @drill_n_bass,
No, unfortunately the syntax of this method doesn’t have this option.
1 Like
good to know. thank you for feedback ! 
I mean, any of the following codes, combining two str.replace()
in one line, is good in this situation:
laptops['weight'] = laptops['weight'].str.replace('kg','').str.replace('s','').astype(float)
or
laptops['weight'] = laptops['weight'].str.replace('kgs', '').str.replace('kg', '').astype(float)
I know, just wonder if there is a way to rid off one of str.replace statement, so the code is more simple and looks shorter.
1 Like
Looks like there’s a way to achieve it:
s.str.replace('kgs|kg', '')
It works because Series.str.replace
accepts a regex pattern, and the |
does an OR
Test snippet:
import pandas as pd
test = pd.Series(['weight in kg', 'weight in kgs', 'qwertykgs'])
print(test.str.replace('kgs|kg', ''))
2 Likes
It works!!! 
laptops['weight'] = (laptops['weight'].str.replace('kg|s','')
.astype(float)
)
Just wonder, about one thing. I was certain that it’s better to use “&”/and than “|”/or. But, when I wrote:
laptops['weight'] = (laptops['weight'].str.replace('kg&s','')
.astype(float)
)
…the code was rejected. I’m not sure why.
When we use “|”/or - I thought the logic is, that it will take away just randomly "kg’, or ‘s’ ( so the filtering won’t be completed totally: because if there will be “kgs” string, it will take only “kg”. Thought that “or” statement won’t take both). On the other hand, there should be ( and probably is - and that’s my error, actually - same problem with “&”/and: when there will be “kg” string, the code will be aborted: no “s” wouldn’t be found)
Great that you tried it out even after 2 weeks!
As you may already know that in regex some characters hold special meaning (including |
, known as Alternation
that acts as a boolean OR, matches left -> right)
To answer your question, here’s what I want you to try (and don’t miss the pattern 'kg[s]?'
in the last print):
import pandas as pd
s = pd.Series(['weight in kg', 'weights in kgs', 'yesss kgs', 'did you miss kg&s?'])
print(':ROUND 1:', s.str.replace('kgs|kg', ''), sep='\n') # Checks kgs first
print()
print(':ROUND 2:', s.str.replace('kg|kgs', ''), sep='\n') # Checks kg first
print()
print(':ROUND 3:', s.str.replace('kg|s', ''), sep='\n')
print()
print(':ROUND 4:', s.str.replace('kg&s', ''), sep='\n')
print()
print(':ROUND 5:', s.str.replace('kg[s]?', ''), sep='\n') # Checks kg with/without single s
Both 'kgs|kg'
and 'kg[s]?'
work.
With 'kg&s'
, I suppose you were looking for 'kg[s]?'
?
([s]?
looks for 0 or 1 occurrence of s
)
totally share in this opnion too