What is Iteration in Python – Understanding Iterables and Iterators for Data Science 2026
Iteration is one of the most fundamental concepts in Python and is used constantly in data science. Understanding the difference between **iterables** and **iterators** helps you write more efficient, memory-friendly, and Pythonic code when working with large datasets.
TL;DR — Key Concepts
- Iterable: Any object you can loop over (lists, tuples, strings, DataFrames, dictionaries, etc.)
- Iterator: An object that remembers its position and returns one value at a time using
next() forloops work on iterables by automatically creating an iterator
1. Iterables vs Iterators – Simple Explanation
numbers = [1, 2, 3, 4, 5] # This is an iterable
# You can loop over it multiple times
for n in numbers:
print(n)
# Creating an iterator from an iterable
iterator = iter(numbers) # Now it's an iterator
print(next(iterator)) # 1
print(next(iterator)) # 2
print(next(iterator)) # 3
# Once exhausted, it raises StopIteration
2. Real-World Data Science Examples
import pandas as pd
df = pd.read_csv("sales_data.csv")
# 1. Iterating over DataFrame rows (not recommended for large data)
for index, row in df.iterrows(): # iterrows() returns an iterator
if row["amount"] > 1000:
print(f"High value order: {row['customer_id']}")
# 2. Better: Using itertuples() - faster iterator
for row in df.itertuples():
if row.amount > 1000:
print(f"High value: {row.customer_id}")
# 3. Iterating over column names (iterable)
for col in df.columns:
print(f"Processing column: {col}")
3. Best Practices for Iteration in Data Science 2026
- Avoid
iterrows()on large DataFrames — useitertuples()or vectorized operations instead - Use
forloops with iterables for readability - Use generators (`yield`) when working with very large datasets to save memory
- Prefer built-in functions like
enumerate(),zip(), andmap()over manual iteration - Understand that many Pandas methods (groupby, resample, etc.) return iterators internally
Conclusion
Iteration is everywhere in data science — from looping through rows and columns to processing large files and model training. In 2026, the key is to understand the difference between iterables and iterators and to choose the most efficient method for each situation. Writing clean, memory-efficient iteration code is a hallmark of experienced data scientists.
Next steps:
- Review your current loops over DataFrames and replace slow
iterrows()with faster alternatives likeitertuples()or vectorized operations