Cleaning Tags - string split (469-6)

I am reading the proposed solution, where we are cleaning and splitting Tags.
The solution code is the following and I am not sure that I fully understand it.

questions["Tags"] = questions["Tags"].str.replace("^<|>$", "").str.split("><")

They way I read regex: at the beginning of the string, find “<” or “>” at the end of the string and replace with nothing. What confused me is this part

str.split("><")

if we already replaced angle brackets how do we do split on angle brackets?

P.S. not relevant to this topic - for some reason I can’t add lesson-screen numbers to tags.

Hi! I haven’t done this particular lesson but I have a guess at what’s going on.

I’m assuming that the values of your “Tags” column are stings of HTML tags, where each would look something like this: “<tag1><tag2><tag3>”

You’ve correctly interpreted that regex: it searches for an opening bracket ("<") at the beginning of the entire string and a closing bracket (">") at the end, then replaces it with nothing. Using the “^” and “$” anchors for the regex restricts where it will find matches, and it is not replacing every instance of a “<” or “>” character with an empty string. So this step would leave my example tag string as: “tag1><tag2><tag3”

If we then split that string by using “><” as our separator, can you see how that leaves us with (what I assume is) the desired outcome of a list of tags?

Hope this helps!

1 Like