Defining Functions in Python – Best Practices for Data Science 2026
Well-written functions are the backbone of clean, reusable, and maintainable data science code. In 2026, following modern Python standards for function definition helps you create modular, testable, and professional-grade data pipelines.
TL;DR — Modern Function Definition Best Practices
- Use type hints for parameters and return values
- Always include a clear docstring
- Keep functions small and focused (single responsibility)
- Use default arguments wisely
1. Modern Function Definition with Type Hints
from typing import List, Optional
from datetime import datetime
def calculate_monthly_revenue(
transactions: List[dict],
month: int,
year: int = 2026
) -> float:
"""
Calculate total revenue for a given month and year.
Args:
transactions: List of transaction dictionaries
month: Month number (1-12)
year: Year (default: current year)
Returns:
Total revenue as float
"""
total = 0.0
for tx in transactions:
tx_date = datetime.fromisoformat(tx["date"])
if tx_date.month == month and tx_date.year == year:
total += tx.get("amount", 0.0)
return round(total, 2)
2. Real-World Data Science Function Example
def clean_and_summarize_sales(
df,
min_amount: float = 0.0,
region: Optional[str] = None
) -> pd.DataFrame:
"""
Clean sales data and return summary statistics.
"""
# Filter and clean
cleaned = df[df["amount"] >= min_amount].copy()
if region:
cleaned = cleaned[cleaned["region"] == region]
# Add useful columns
cleaned = cleaned.assign(
year=cleaned["order_date"].dt.year,
month_name=cleaned["order_date"].dt.month_name()
)
# Return summary
summary = cleaned.groupby("region").agg({
"amount": ["sum", "mean", "count"],
"customer_id": "nunique"
}).round(2)
return summary
3. Best Practices for Data Science Functions in 2026
- Always add clear, informative docstrings (Google or NumPy style)
- Use type hints for all parameters and return values
- Keep functions focused on doing one thing well
- Use descriptive parameter and function names
- Provide sensible default values where appropriate
- Return clean, consistent data types
Conclusion
Writing good functions is one of the most important skills in data science. In 2026, the standard is to define functions with type hints, comprehensive docstrings, and clear single responsibility. Well-designed functions make your data science code reusable, testable, and much easier to maintain as projects grow.
Next steps:
- Review your current data science scripts and refactor repeated logic into well-documented, typed functions