One of the most powerful features of LangSmith in 2026 is its ability to detect anomalies in Agentic AI systems — such as sudden cost spikes, unusual agent behavior, error rate increases, or performance degradation — before they become major problems.
This practical guide shows you how to set up effective anomaly detection using LangSmith for your CrewAI and LangGraph agents as of March 24, 2026.
Why Anomaly Detection is Critical for Agentic AI
Agentic systems are highly dynamic. Small changes in prompts, tools, or external APIs can cause:
- Sudden cost explosions
- Unexpected increases in token usage
- Spikes in error rates
- Degraded agent performance
- Unusual tool calling patterns
Setting Up LangSmith Anomaly Detection
1. Enable LangSmith with Proper Project Configuration
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "lsv2_your_key_here"
os.environ["LANGCHAIN_PROJECT"] = "agentic-ai-production-v1"
2. Create Custom Anomaly Detection Rules
from langsmith import Client
import datetime
client = Client()
def setup_anomaly_alerts():
# Create alert rules in LangSmith
client.create_alert_rule(
project_name="agentic-ai-production-v1",
name="Cost Spike Alert",
metric="total_cost",
condition="greater_than",
threshold=5.0, # Alert if cost > $5 per run
window="1h",
notification_channels=["slack", "email"]
)
client.create_alert_rule(
project_name="agentic-ai-production-v1",
name="Error Rate Spike",
metric="error_rate",
condition="greater_than",
threshold=0.15, # Alert if error rate > 15%
window="30m"
)
client.create_alert_rule(
project_name="agentic-ai-production-v1",
name="Token Usage Anomaly",
metric="total_tokens",
condition="greater_than_percentile",
threshold=95, # Alert if above 95th percentile
window="1h"
)
3. Advanced Custom Anomaly Detection with Python
from langsmith import Client
import pandas as pd
from datetime import timedelta
client = Client()
def detect_cost_anomalies(days_back=7):
runs = client.list_runs(
project_name="agentic-ai-production-v1",
start_time=datetime.datetime.now() - timedelta(days=days_back)
)
df = pd.DataFrame([{
"run_id": run.id,
"cost": run.total_cost or 0,
"timestamp": run.start_time
} for run in runs])
# Calculate rolling mean and standard deviation
df['rolling_mean'] = df['cost'].rolling(window=10).mean()
df['rolling_std'] = df['cost'].rolling(window=10).std()
# Flag anomalies (more than 3 standard deviations)
df['is_anomaly'] = abs(df['cost'] - df['rolling_mean']) > (3 * df['rolling_std'])
anomalies = df[df['is_anomaly']]
if not anomalies.empty:
print(f"🚨 Found {len(anomalies)} cost anomalies in the last {days_back} days")
for _, row in anomalies.iterrows():
print(f" Run {row['run_id']}: ${row['cost']:.4f} (expected ~${row['rolling_mean']:.4f})")
return anomalies
Recommended Anomaly Detection Rules for 2026
- Cost Per Run Alert: Trigger when a single run exceeds $3–$5
- Daily Budget Alert: Notify when daily spend exceeds 80% of allocated budget
- Error Rate Spike: Alert when error rate jumps above 10–15%
- Token Usage Anomaly: Detect unusual increases in token consumption
- Tool Usage Anomaly: Monitor for unexpected spikes in expensive tool calls
Best Practices for Anomaly Detection
- Start with conservative thresholds and tune them over time
- Combine statistical methods with rule-based alerts
- Route critical alerts to PagerDuty or Opsgenie
- Review anomalies weekly to improve agent prompts and tools
- Use LangSmith’s built-in evaluation features alongside custom rules
Last updated: March 24, 2026 – LangSmith anomaly detection, combined with custom statistical monitoring, has become the standard approach for maintaining healthy and cost-effective Agentic AI systems in production.
Pro Tip: Set up cost anomaly alerts before you scale your agent system. Early detection can save thousands of dollars per month.