Confusions with regular expression

I’m going through the regular expression mission and it is recommended that we should always use raw strings. But I got really confusing that some of the backslashes are not escaped. For example, in re.search() function, r’\s’ cannot be literally ‘\s’ whether or not in raw string, but r’\b’ will be literally ‘\b’. Similarly, ‘\s’ does not seem to do anything to ‘\s’ either:

string = r'This is a test. There is an a\s in this string.'
pattern_a = 'a\s'
patternraw_a = r'a\s'
import re
result = re.search(pattern, string)
print(result)
>>>  <re.Match object; span=(8, 10), match='a '>
resultraw = re.search(patternraw_a, string)
print(resultraw)
>>>  <re.Match object; span=(8, 10), match='a '>
string_b = r'This is a test. There is a test\b in this string.'
pattern_b = 'test\b'
patternraw_b = r'test\b'
result_b = re.search(pattern_b, string_b)
print(result_b)
>>> None
resultraw_b = re.search(patternraw_b, string_b)
print(resultraw_b)
>>> <re.Match object; span=(10, 14), match='test'>

Another related confusion is that, ‘\s’ seems to be always taken literally when using print(), but ‘\n’ is not, and escapable through ‘\n’

    print('\s')
    print(r'\s')
    print('\\s')
    print(r'\s')
    print(r'\\s')
       
    [outs:] 
    \s
     \s
    \s
    \\s

    print('\\n')
    print('\n')
    print(r'\n')
        
    [outs:]
    \n


    \n

My question is: how do you remember what is escapable what is not under different functions? There must be a pattern that I’m not seeing.
Thank you for your help!

1 Like

Hey @xiwei.shan, I have edited and format your post to use ``` triple back ticks to format your code block. How to use triple back ticks ``` to format a code block

\s is a not a supported escape character. You can read more at Python 3 documentation on string literals.

You have to check the module on which literals are supported as an escape character.

1 Like