Removing last comma in string list

Screen Link:

https://app.dataquest.io/m/469/guided-project%3A-popular-data-science-questions/6/cleaning-the-data
My Code:

questions['Tags'].str.replace("<", "").str.replace(">", ",")

What I expected to happen:
0 machine-learning,data-mining,
1 machine-learning,regression,linear-regression,…
2 python,time-series,forecast,forecasting,
3 machine-learning,scikit-learn,pca,
4 dataset,bigdata,data,speech-to-text,

Output is as expected. But how do I remove the last comma in each list? I tried
questions['Tags'] = questions['Tags'][:-1] and questions['Tags'].rstrip() For rstrip I got the error

AttributeError: 'Series' object has no attribute 'rstrip'
1 Like

As the error points out Series does not have any rstrip function/method. You need to use str.rstrip() instead since rstrip is a string method.

Even for questions['Tags'][:-1], you need to use str[:-1] because otherwise you are indexing the Series rows instead of the string values in each row.

str is to be used for the string values in each row.

1 Like

I don’t seem to get the right combination/order of steps for removing the commas and <>. When I replace β€œ>” with β€œ,” followed by str.rstrip , <> are added back in the strings. Is there another, easier way to do this?
In the solution

questions["Tags"] = questions["Tags"].str.replace("^<|>$", "").str.split("><")

I do not understand why ^|$ are in str.replace. What steps should I take and what logic should I follow when doing so? Thanks for your help.

1 Like

You will have to share your exact code to be sure of what mistake you are making. But it seems like you are not saving the results back to questions['Tags'].

I would recommend going through the regex (regular expressions) sections again to understand what that specific pattern does. It would also be helpful if you checked out replace() documentation to understand how it works.

1 Like

Basics-Copy1.ipynb (11.4 KB)
After reading more on regex and str.replace, I think I understand it better now.

str.replace("^<|>$", "")

matches < or >, at the beginning or end of a string, and replaces it with nothing. Is this correct?

I uploaded the code I am working on at the top of this section. Is there a way to remove β€œ<” from just the beginning of a string? When I tried

questions['Tags'] = questions['Tags'].str.replace("^<", "")

the result was NaN for question[β€˜Tags’]. I was trying to see if there is a different way to solve for this step besides the provided solution.

1 Like

I did it a bit differently, I replaced the β€œ><” with commas first, then removed the β€œ<” and β€œ>” at the start and end of the string:

dsse['Tags'] = dsse['Tags'].str.replace("><", ",").str.replace(r'[<>]', "")
2 Likes

Another solution to the β€œlast comma in each list” problem could be add a simple regex pattern of lookarounds to the end of your chain:

questions["Tags"] = questions["Tags"].str.replace("<","").str.replace(">",",").str.replace(r',(?!.)',"")

In this case the symbol , will be matched only when it is not followed by the special character . (Any character except line break).