I’m going through the regular expression mission and it is recommended that we should always use raw strings. But I got really confusing that some of the backslashes are not escaped. For example, in re.search() function, r’\s’ cannot be literally ‘\s’ whether or not in raw string, but r’\b’ will be literally ‘\b’. Similarly, ‘\s’ does not seem to do anything to ‘\s’ either:
string = r'This is a test. There is an a\s in this string.' pattern_a = 'a\s' patternraw_a = r'a\s'
import re result = re.search(pattern, string) print(result) >>> <re.Match object; span=(8, 10), match='a '>
resultraw = re.search(patternraw_a, string) print(resultraw) >>> <re.Match object; span=(8, 10), match='a '>
string_b = r'This is a test. There is a test\b in this string.' pattern_b = 'test\b' patternraw_b = r'test\b' result_b = re.search(pattern_b, string_b) print(result_b) >>> None
resultraw_b = re.search(patternraw_b, string_b) print(resultraw_b) >>> <re.Match object; span=(10, 14), match='test'>
Another related confusion is that, ‘\s’ seems to be always taken literally when using print(), but ‘\n’ is not, and escapable through ‘\n’
print('\s') print(r'\s') print('\\s') print(r'\s') print(r'\\s') [outs:] \s \s \s \\s print('\\n') print('\n') print(r'\n') [outs:] \n \n
My question is: how do you remember what is escapable what is not under different functions? There must be a pattern that I’m not seeing.
Thank you for your help!