Backreferences in regular expressions allow you to refer to a previously matched group in the pattern. They are useful for cases where you need to match a pattern that repeats a previously matched substring, such as matching repeated words or character sequences.
To create a backreference, you use the backslash followed by the group number, \n, where n is the number of the group you want to refer to. For example, the backreference \1 refers to the first group, \2 refers to the second group, and so on.
Here's an example of using a backreference to match repeated words:
import retext = "The cat in the hat hat"pattern = r"\b(\w+)\b\s+\1"matches = re.findall(pattern, text)print(matches) |
In this example, the pattern \b(\w+)\b\s+\1 matches a word boundary, followed by one or more word characters (which are captured into the first group), followed by one or more whitespace characters, followed by the backreference \1, which matches the exact same string as the first captured group. The findall() function returns a list of all the matched groups.
The output of this program is:
['hat'] |
As you can see, the findall() function has returned a list containing the matched group "hat", which is repeated twice in the input string.
Backreferences can also be used in replacement strings when using the sub() function to replace matched patterns with new strings. For example, the following code replaces all occurrences of "dog" followed by "cat" with "catdog":
import retext = "The dog chased the cat and the cat chased the dog"pattern = r"dog\s+cat"new_text = re.sub(pattern, "catdog", text)print(new_text) |
In this example, the pattern dog\s+cat matches the substring "dog" followed by one or more whitespace characters, followed by the substring "cat". The sub() function replaces all occurrences of this pattern with the string "catdog". Since there are no capturing groups in the pattern, we don't need to use backreferences in the replacement string.
The output of this program is:
| The catdog chased the cat and the cat chased the catdog |
As you can see, the sub() function has replaced both occurrences of "dog cat" with "catdog".