Cleaning Tags - string split (469-6)

I am reading the proposed solution, where we are cleaning and splitting Tags.
The solution code is the following and I am not sure that I fully understand it.

questions["Tags"] = questions["Tags"].str.replace("^<|>$", "").str.split("><")

They way I read regex: at the beginning of the string, find “<” or “>” at the end of the string and replace with nothing. What confused me is this part


if we already replaced angle brackets how do we do split on angle brackets?

P.S. not relevant to this topic - for some reason I can’t add lesson-screen numbers to tags.

Hi! I haven’t done this particular lesson but I have a guess at what’s going on.

I’m assuming that the values of your “Tags” column are stings of HTML tags, where each would look something like this: “<tag1><tag2><tag3>”

You’ve correctly interpreted that regex: it searches for an opening bracket ("<") at the beginning of the entire string and a closing bracket (">") at the end, then replaces it with nothing. Using the “^” and “$” anchors for the regex restricts where it will find matches, and it is not replacing every instance of a “<” or “>” character with an empty string. So this step would leave my example tag string as: “tag1><tag2><tag3”

If we then split that string by using “><” as our separator, can you see how that leaves us with (what I assume is) the desired outcome of a list of tags?

Hope this helps!

1 Like