Using Capture Groups in a RegEx Pattern

, ,

Hi, I’m currently doing the Advanced Regular Expressions in R Data Analyst path. The lesson I’m currently at is “BackReferences: Using Capture Groups in a RegEx Pattern”.

Could someone elaborate on the following RegEx?

Screen Link:
https://app.dataquest.io/m/400/advanced-regular-expressions/6/backreferences-using-capture-groups-in-a-regex-pattern

Code from the lecture:

    test_cases  <-  c(
        "I'm going to read a book.",
        "Green is my favorite color.",
        "My name is Aaron.",
        "No doubles here.",
        "I have a pet eel."
    )
    ​
    print(str_match(test_cases, "(\\w)\\1"))
    [,1] [,2]
    [1,] "oo" "o" 
    [2,] "ee" "e" 
    [3,] NA   NA  
    [4,] NA   NA  
    [5,] "ee" "e"

What I expected to happen:

    [1,] "o" "oo" 
    [2,] "e" "ee" 
    [3,] NA   NA  
    [4,] NA   NA  
    [5,] "e" "ee"

I’m not sure why the output:

    [1,] "oo" "o" 
    [2,] "ee" "e" 
    [3,] NA   NA  
    [4,] NA   NA  
    [5,] "ee" "e"

Any further explanation would help! Thank you :smiley:

Hi there!
I’m not familiar with R but as far as str_match function goes, I think the first column is the regex match where “(\w)\1” will match “oo” from “book” displayed in the first column.

And the second column displays “o” because technically \w in “(\w)\1” captures “o” which is doubled into “oo” by the adding \1 after (\w) .

If you see the example output from str_match function here, you’ll see that second column (and onwards) when you do a print(str_match()) is the capture results based on the group you’re specifying.

I hope my explanation does not cause more confusion. Haha.

Best,
AL