Non-capturing groups in regular expressions are groups that allow you to group together parts of a pattern without capturing the matched text into a group. This can be useful in cases where you don't need to extract the matched text from the group, but you still want to use grouping to apply regular expression operators or quantifiers.
Non-capturing groups are created using the syntax (?:pattern), where pattern is the regular expression pattern that you want to group together.
For example, let's say you want to match a pattern that starts with "http://" or "https://", followed by a domain name and a path. You can use a non-capturing group to group together the "http://" or "https://" part, like this:
import retext = "Visit my website at https://www.example.com/path/to/page.html"pattern = r"(?:https?://)([a-zA-Z0-9.-]+)(/[a-zA-Z0-9./_-]*)"matches = re.findall(pattern, text)print(matches) |
In this example, the pattern (?:https?://) matches either "http://" or "https://", but doesn't capture it into a group. The rest of the pattern matches the domain name and path, and captures them into two groups. The findall() function returns a list of all the matched groups.
The output of this program is:
[('www.example.com', '/path/to/page.html')] |
As you can see, the findall() function has returned a list containing the two captured groups, but not the "http://" or "https://" part, since we used a non-capturing group for that.
Non-capturing groups can also be used with the pipe | operator to create alternate patterns. For example, the pattern (?:cat|dog)fish matches either "catfish" or "dogfish", but doesn't capture the "cat" or "dog" part into a group.
import retext = "I have a catfish and a dogfish, but not a fish or a catdogfish"pattern = r"(?:cat|dog)fish"matches = re.findall(pattern, text)print(matches) |
In this example, the pattern (?:cat|dog)fish matches either "catfish" or "dogfish", but doesn't capture the "cat" or "dog" part into a group. The findall() function returns a list of all the matched patterns.
The output of this program is:
['catfish', 'dogfish'] |
As you can see, the findall() function has returned a list containing both "catfish" and "dogfish" as the matched patterns, but without the "cat" or "dog" part captured into a group.