Quantifiers in re Module – Complete Guide for Data Science 2026
Quantifiers are the heart of regular expressions in Python’s re module. They let you specify exactly how many times a character, group, or pattern should repeat — from zero times to unlimited. In data science, quantifiers power everything from cleaning repeated punctuation in logs, extracting variable-length numbers, detecting sequences of digits in reports, to building robust feature-extraction pipelines. Mastering quantifiers is essential for writing concise, high-performance regex in 2026.
TL;DR — All Quantifiers in re
*→ zero or more+→ one or more?→ zero or one{n}→ exactly n times{n,}→ n or more{n,m}→ between n and m times?after any quantifier → non-greedy (minimal match)
1. Basic Quantifiers
import re
text = "aaaabbbccc!!! 2026-03-19 order-98765"
print(re.findall(r"a*", text)) # zero or more
print(re.findall(r"b+", text)) # one or more
print(re.findall(r"!{2,}", text)) # two or more
print(re.findall(r"d{4}", text)) # exactly 4 digits
print(re.findall(r"order-d{1,5}", text)) # 1 to 5 digits
2. Real-World Data Science Examples with Pandas
import pandas as pd
df = pd.read_csv("logs.csv")
# Remove repeated punctuation
df["clean"] = df["log"].str.replace(r"!{2,}", "!", regex=True)
# Extract sequences of repeated letters
df["repeated"] = df["log"].str.extract(r"([a-zA-Z])1{2,}")
# Match variable-length order IDs
df["order_id"] = df["log"].str.extract(r"order-(d{1,6})")
3. Greedy vs Non-Greedy Quantifiers
# Greedy (default)
print(re.search(r"a+", "aaaaaaa")) # matches the whole string
# Non-greedy
print(re.search(r"a+?", "aaaaaaa")) # matches only one "a"
4. Best Practices in 2026
- Use raw strings
r"..."for every pattern - Prefer specific
{n,m}over*or+when possible for speed and clarity - Add
?for non-greedy matching when you want the smallest possible match - Pre-compile patterns used repeatedly with
re.compile() - Combine with pandas
.str.extract()and.str.replace(regex=True)for vectorized operations
Conclusion
Quantifiers in the re module are the most frequently used feature when working with real-world text in data science. In 2026, mastering *, +, ?, {n,m} and greedy/non-greedy behavior lets you write concise, fast, and precise patterns for cleaning, extracting, and transforming data at scale. These techniques complete the core regex toolkit and prepare you for advanced text processing pipelines.
Next steps:
- Review one of your current regex patterns and optimize it using the full set of quantifiers shown above