How to Build a Generator Function in Python – Step-by-Step Guide for Data Science 2026

How to Build a Generator Function in Python – Step-by-Step Guide for Data Science 2026

Building your own generator functions with yield is one of the most valuable skills for handling large-scale data in Python. Unlike regular functions that return once, generator functions can pause and resume, producing values one at a time with minimal memory usage.

TL;DR — Core Rules

Use def and yield instead of return
The function automatically becomes a generator when it contains yield
Call it like a normal function — it returns a generator object

1. Simple Generator Function

def count_up_to(n):
    i = 1
    while i <= n:
        yield i
        i += 1

# Usage
for number in count_up_to(5):
    print(number)

2. Real Data Science Generator Functions

import pandas as pd

# Example 1: Row-by-row processor
def process_sales_rows(df):
    for row in df.itertuples():
        profit = row.amount * 0.25
        category = "Premium" if profit > 500 else "Standard"
        yield {
            "customer_id": row.customer_id,
            "amount": row.amount,
            "profit": round(profit, 2),
            "category": category
        }

# Example 2: Chunked file reader with enrichment
def read_and_enrich_large_csv(file_path, chunk_size=50000):
    for chunk in pd.read_csv(file_path, chunksize=chunk_size):
        chunk["profit"] = chunk["amount"] * 0.25
        chunk["log_amount"] = chunk["amount"].apply(lambda x: round(x**0.5, 2) if x > 0 else 0)
        yield chunk

# Usage
for enriched_chunk in read_and_enrich_large_csv("huge_sales.csv"):
    print(f"Processed chunk with {len(enriched_chunk)} rows")
    # Save or further process this chunk

3. Advanced Generator with Multiple Yields & State

def batch_processor(data, batch_size=100):
    batch = []
    for item in data:
        batch.append(item)
        if len(batch) >= batch_size:
            yield batch
            batch = []
    if batch:
        yield batch

# Usage
for batch in batch_processor(large_dataset):
    print(f"Processing batch of {len(batch)} items")

4. Best Practices for Building Generators in 2026

Keep generator functions focused on one clear responsibility
Use descriptive names and document what is being yielded
Prefer itertuples() over iterrows() inside generators for speed
Use yield from when delegating to another generator
Test generators with next() and small inputs first

Conclusion

Building custom generator functions is a key skill for modern data science. In 2026, they allow you to process massive datasets, build reusable pipelines, and keep memory usage low. Start simple, practice the yield pattern, and gradually move your data processing code from full lists to powerful, lazy generators.

Next steps:

Take one of your existing data processing scripts and rewrite the core loop as a custom generator function

How to Build a Generator Function in Python – Step-by-Step Guide for Data Science 2026

TL;DR — Core Rules

1. Simple Generator Function

2. Real Data Science Generator Functions

3. Advanced Generator with Multiple Yields & State

4. Best Practices for Building Generators in 2026

Conclusion

Related Articles in Data Science Tool Box 2026

Data Science Tool Box – Complete Guide & Best Practices 2026

Using zip() in Python – Parallel Iteration Made Simple for Data Science 2026

Using pandas read_csv iterator for Streaming Large Data – Best Practices 2026

Generating content...