questions['Tags'].str.replace("<", "").str.replace(">", ",")
What I expected to happen:
Output is as expected. But how do I remove the last comma in each list? I tried
questions['Tags'] = questions['Tags'][:-1] and
questions['Tags'].rstrip() For rstrip I got the error
AttributeError: 'Series' object has no attribute 'rstrip'
As the error points out
Series does not have any
rstrip function/method. You need to use
str.rstrip() instead since
rstrip is a string method.
questions['Tags'][:-1], you need to use
str[:-1] because otherwise you are indexing the Series rows instead of the string values in each row.
str is to be used for the string values in each row.
I don’t seem to get the right combination/order of steps for removing the commas and <>. When I replace “>” with “,” followed by str.rstrip , <> are added back in the strings. Is there another, easier way to do this?
In the solution
questions["Tags"] = questions["Tags"].str.replace("^<|>$", "").str.split("><")
I do not understand why ^|$ are in str.replace. What steps should I take and what logic should I follow when doing so? Thanks for your help.
You will have to share your exact code to be sure of what mistake you are making. But it seems like you are not saving the results back to
I would recommend going through the regex (regular expressions) sections again to understand what that specific pattern does. It would also be helpful if you checked out
replace() documentation to understand how it works.
Basics-Copy1.ipynb (11.4 KB)
After reading more on regex and str.replace, I think I understand it better now.
matches < or >, at the beginning or end of a string, and replaces it with nothing. Is this correct?
I uploaded the code I am working on at the top of this section. Is there a way to remove “<” from just the beginning of a string? When I tried
questions['Tags'] = questions['Tags'].str.replace("^<", "")
the result was NaN for question[‘Tags’]. I was trying to see if there is a different way to solve for this step besides the provided solution.
I did it a bit differently, I replaced the “><” with commas first, then removed the “<” and “>” at the start and end of the string:
dsse['Tags'] = dsse['Tags'].str.replace("><", ",").str.replace(r'[<>]', "")
Another solution to the “last comma in each list” problem could be add a simple regex pattern of lookarounds to the end of your chain:
questions["Tags"] = questions["Tags"].str.replace("<","").str.replace(">",",").str.replace(r',(?!.)',"")
In this case the symbol
, will be matched only when it is not followed by the special character
. (Any character except line break).