In regular expressions, you can use parentheses () to create groups of characters and capture the matched text within those groups. This allows you to extract specific parts of a matched pattern and use them later in your code.
Here's an example to illustrate grouping and capturing:
import retext = "John Doe, jane doe, and Jim Smith"pattern = "(\w+) (\w+)"matches = re.findall(pattern, text)print("Matches:", matches)for match in matches: print("First name:", match[0]) print("Last name:", match[1]) |
In this example, we are trying to match the first and last names of people in a given text. We are using the pattern (\w+) (\w+) to create two groups of word characters separated by a space. The findall() function returns a list of tuples containing the matched text for each group in each match.
The output of this program is:
Matches: [('John', 'Doe'), ('jane', 'doe'), ('Jim', 'Smith')]First name: JohnLast name: DoeFirst name: janeLast name: doeFirst name: JimLast name: Smith |
As you can see, the findall() function returns a list of tuples containing the first and last names of each person in the given text. We are then using a loop to print out each first and last name separately.
You can also use the captured groups in the replacement string when using the sub() function. For example:
import retext = "John Doe, jane doe, and Jim Smith"pattern = "(\w+) (\w+)"new_text = re.sub(pattern, r"\2, \1", text)print("New text:", new_text) |
In this example, we are using the captured groups to swap the first and last names in the given text. The sub() function replaces each match with the contents of the second group, followed by a comma, followed by the contents of the first group.
The output of this program is:
New text: Doe, John, doe, jane, Smith, Jim |
sub() function has swapped the first and last names in the given text.