Numbered Groups in re Module – Complete Guide for Data Science 2026
Numbered groups are the default capturing groups created by plain parentheses (...) in regular expressions. Python’s re module automatically assigns them numbers starting from 1 (left to right). You can then reference them with match.group(1), \1 in substitutions, or as columns in pandas .str.extract(). Numbered groups are the simplest and most commonly used way to extract multiple structured fields from text in data science workflows.
TL;DR — Numbered Groups
(pattern)→ creates group 1, 2, 3…- Access with
match.group(1),\1,\2 - Return multiple columns with pandas
.str.extract() - Perfect for extracting IDs, dates, prices, emails in one pass
1. Basic Numbered Groups
import re
text = "Order ORD-98765 for $1,250.75 on 2026-03-19"
match = re.search(r"ORD-(d+).*?$(d+(?:,d+)?(?:.d+)?)", text)
print(match.group(0)) # full match
print(match.group(1)) # order ID (group 1)
print(match.group(2)) # amount (group 2)
2. Real-World Data Science Examples with Pandas
import pandas as pd
df = pd.read_csv("logs.csv")
# Extract multiple fields using numbered groups
df[["order_id", "amount", "date"]] = df["log"].str.extract(
r"ORD-(d+).*?$(d+(?:,d+)?(?:.d+)?).*?(d{4}-d{2}-d{2})"
)
# One-line extraction of named fields via numbered groups
df["email"] = df["log"].str.extract(r"(S+@S+.S+)")[0]
3. Numbered Groups in Substitution
# Reorder date using numbered groups
print(re.sub(r"(d{4})-(d{2})-(d{2})", r"3/2/1", "2026-03-19"))
# Swap first and last name
print(re.sub(r"(w+)s+(w+)", r"2, 1", "Alice Johnson"))
4. Best Practices in 2026
- Use numbered groups when you need simple positional access to captured values
- Switch to named groups
(?P<name>...)for complex patterns with many groups - Use non-capturing groups
(?:...)when you only need grouping - Pre-compile patterns that contain several numbered groups
- Combine with pandas
.str.extract()for vectorized multi-column extraction
Conclusion
Numbered groups are the foundation of structured text extraction in Python’s re module. In 2026 data science projects, they let you pull multiple fields (IDs, amounts, dates, emails…) in a single clean pattern and reference them instantly with group(1), \1, or pandas columns. Use numbered groups for simple cases, named groups for readability in complex patterns, and you’ll build faster, more maintainable text-processing pipelines than ever.
Next steps:
- Take one of your current regex patterns and convert it to use numbered groups for multi-field extraction