HI, I want to analyze my data on Netflix. There is a column “Title” in which are the titles of the shows you watches. These shows are movies like “Karate Kid” and series like “Cobra Kai”. Series are printed in this pattern:
SpongeBob Schwammkopf: Staffel 4: Der allerschönste Tag / Gummilein (Folge 20)
Karate Kid IV – Die nächste Generation
I need a Regex which extracts the bolded part. I have no problem doing this with a movie title – but doing this with an episode of a series is hard for me. (It needs to work also without colons) I tried this:
Thinking I could exract everything before a possible colon and before a possible second column. But it did not work.
Help is appreciated. Best and thank you.
I don’t understand what each string actually is. For movies, for instance, it looks like you want to extract everything.
Absolutely. I need a pattern which extract everything from movie but in case of series episoe only the part befor the first and second colon. Like: TitleOfTheSeries**:**Season and not the rest.
I don’t understand what the dataset looks like, but the pattern
[\w\s]*:[\w\s]*(?=:) works to capture
SpongeBob Schwammkopf: Staffel 4 in
SpongeBob Schwammkopf: Staffel 4: Der allerschönste Tag / Gummilein (Folge 20).
many thanks. Yes it works for an episode of a series with this pattern
I learned a lot in seeing how you used the square brackets, the asterisk ans the Positive Lookaead. Thank you.
But the regex should also work for cases where no colon is placed – a movie. Like Karate Kid IV or Critters
Based on your pattern I tried one which only searches for titles without colon. I did this:
r'([\w\s]*(?!:))' But it did not work.
So I need a way to find both cases of a title: With and Without colon. And in case of a colon only the parts before the first and after the first colon. Not more. Thank you for your time and advice.
Ultimately, this is a game, though, because you can use other tools in addition to regular expressions (e.g. Python and if/else statements), there’s no need to try to catch all cases with a single regular expression. I like to do it for fun, not to accomplish things.