How does `.group()` work in the `extract_and_increment()` example?

Screen Link: https://app.dataquest.io/m/355/list-comprehensions-and-lambda-functions/8/lambda-functions

The function extract_and_increment() below extracts digits from a string using regex, groups them together using the .group() method, so we get a single integer, then adds one to that integer.

import re
def extract_and_increment(string):
    digits = re.search(r"\d+", string).group()
    incremented = int(digits) + 1
    return incremented

But I’m not seeing how it’s supposed to work. For example, if

string = "a1bb22ccc333dddd4444eeeee55555"

then my guess is that re.search(r"\d+", string) would match the digits in string, which would be 1, 22, 333, 4444 and 55555 the way regexr.com shows here:

test_string_regexr

But, that doesn’t seem to happen…

string = "a1bb22ccc333dddd4444eeeee55555"
re.search(r"\d+", string)

…generates the output:

<re.Match object; span=(1, 2), match='1'>

…it only seems to match the first digit, 1. What am I misunderstanding?

After that, what would .group() do? I’ve only seen re.group() used with regex that contain capture groups, but this example has no capture groups, so I’m not sure how to interpret it. If I apply .group() as follows…

digits = re.search(r"\d+", string).group()
digits

…which outputs…
'1'

…I’m not sure what to make of the output. Is it just telling me that there’s one group because there’s just one match?

After that, I understand. incremented = int(digits) + 1 casts digits (1 in this case) as an integer and adds 1 to that integer (resulting in 2 in this case):

incremented = int(digits) + 1
incremented

Out: 2

To summarize: My questions are about digits = re.search(r"\d+", string).group():

  1. Why does re.search() match only the first digit instead of all the digit “groups”?
  2. How is .group() supposed to work if there are no capture groups?

Hey. First off, we have a content bug on our end, which doesn’t invalidate your questions, but it’s a good starting point to answer them.

The relevant portion of the screen should read as follows.

For instance, this function below, which extracts the first sequence of digits from a string and then adds one to the resultant integer:

def extract_and_increment(string):
   digits = re.search(r"\d+", string).group()
   incremented = int(digits) + 1
   return incremented

Here’s an example:

>>> extract_and_increment("fd17epsteindidn'tkillhimselftimes1000")
18

Let’s go back to your example.

>>> string = "a1bb22ccc333dddd4444eeeee55555"
>>> re.search(r"\d+", string)
<re.Match object; span=(1, 2), match='1'>

We start with the first question.

The answer is that that’s how re.search is supposed to work:

image
It only finds the first occurrence in the whole string. Also, note that it says that it returns a match object.

The method being used here pertains match objects. The following screenshot was taken from here. I’m pasting everything for completion and self-containment, but to answer this question it’s enough to focus on the highlighted part.

Consequently, in your example, re.search(r"\d+", string).group() is the same as re.search(r"\d+", string).group(0) and it’s the whole match. Now remember that the whole match is what results from re.search (which finds the first match only).

2 Likes

Very helpful, thanks!

1 Like