Problem: Your AI Agent Gets Stuck in Infinite Loops
Your autonomous coding agent runs the same failed test 47 times, edits the same file repeatedly, or keeps searching for solutions it already tried. You're burning API credits and getting no useful work done.
You'll learn:
- Why AI agents loop on coding tasks
- Three debugging patterns to detect loops early
- Production-ready safeguards for autonomous systems
Time: 25 min | Level: Advanced
Why This Happens
AI agents lack implicit memory of their action history within a single task. Each decision uses context from the conversation, but the model doesn't inherently "know" it already tried something unless that information is explicitly in the prompt.
Common symptoms:
- Agent runs
pytest→ sees failure → edits code → runspytestwith identical results - Searches documentation for the same query 5+ times
- Applies the same fix pattern to different files expecting different outcomes
- Burns through token limits or API rate limits
Root cause: No built-in loop detection between the orchestration layer (your code) and the LLM's reasoning layer.
Solution
Step 1: Implement Action History Tracking
Track what the agent has already attempted in a structured format.
from dataclasses import dataclass, field
from typing import List, Dict, Any
from datetime import datetime
@dataclass
class ActionRecord:
"""Track each action the agent takes"""
action_type: str # "run_test", "edit_file", "search_docs"
parameters: Dict[str, Any] # {"file": "app.py", "line": 42}
result: str # stdout/stderr or operation result
timestamp: datetime = field(default_factory=datetime.now)
def signature(self) -> str:
"""Create unique signature for duplicate detection"""
# Sort params for consistent hashing
param_str = str(sorted(self.parameters.items()))
return f"{self.action_type}:{param_str}"
class AgentMemory:
def __init__(self, loop_threshold: int = 3):
self.history: List[ActionRecord] = []
self.loop_threshold = loop_threshold
def record(self, action: ActionRecord) -> None:
self.history.append(action)
def detect_loop(self) -> bool:
"""Check if agent is repeating actions"""
if len(self.history) < self.loop_threshold:
return False
# Check last N actions for duplicates
recent = self.history[-self.loop_threshold:]
signatures = [a.signature() for a in recent]
# Same action repeated threshold times = loop
return len(set(signatures)) == 1
def get_context_summary(self, last_n: int = 10) -> str:
"""Generate summary for LLM context"""
recent = self.history[-last_n:]
summary = "## Recent Actions\n"
for i, action in enumerate(recent, 1):
summary += f"{i}. {action.action_type}"
if action.parameters:
summary += f" ({action.parameters})"
summary += f"\n Result: {action.result[:100]}...\n"
return summary
Why this works: Creates a feedback mechanism that surfaces patterns invisible to the LLM alone.
Step 2: Add Loop Detection to Your Orchestration Layer
Integrate memory into your agent's execution loop with circuit-breaker logic.
from typing import Optional
import anthropic
class AutonomousCodingAgent:
def __init__(self, api_key: str, max_iterations: int = 50):
self.client = anthropic.Anthropic(api_key=api_key)
self.memory = AgentMemory(loop_threshold=3)
self.max_iterations = max_iterations
def run_task(self, task: str) -> str:
"""Execute coding task with loop protection"""
iteration = 0
while iteration < self.max_iterations:
# Check for loops before calling LLM
if self.memory.detect_loop():
return self._handle_loop_detected()
# Get next action from LLM
action = self._get_next_action(task)
if action["type"] == "task_complete":
return action["result"]
# Execute action and record
result = self._execute_action(action)
self.memory.record(ActionRecord(
action_type=action["type"],
parameters=action.get("params", {}),
result=result
))
iteration += 1
return self._handle_max_iterations()
def _get_next_action(self, task: str) -> Dict[str, Any]:
"""Query LLM with action history context"""
system_prompt = f"""You are an autonomous coding agent.
{self.memory.get_context_summary()}
CRITICAL: Review your recent actions above. If you've tried the same
approach multiple times with the same failure, you MUST try a different
strategy or escalate to the user.
Available actions:
- run_test: Execute tests
- edit_file: Modify source code
- search_docs: Query documentation
- task_complete: Finish when solved
- escalate: Ask user for help
"""
response = self.client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4000,
system=system_prompt,
messages=[{"role": "user", "content": task}]
)
# Parse action from response (simplified)
return self._parse_action(response.content[0].text)
def _handle_loop_detected(self) -> str:
"""Intervention when loop is detected"""
last_actions = self.memory.history[-3:]
# Give LLM one chance to break out with explicit warning
override_prompt = f"""LOOP DETECTED. You just repeated this action 3 times:
{last_actions[0].action_type} with {last_actions[0].parameters}
Each attempt failed with: {last_actions[0].result}
You MUST either:
1. Try a completely different approach
2. Use 'escalate' action to ask the user for help
What will you do differently?"""
response = self.client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2000,
messages=[{"role": "user", "content": override_prompt}]
)
# Execute one more action, then escalate if still looping
action = self._parse_action(response.content[0].text)
result = self._execute_action(action)
if self.memory.detect_loop():
return f"ESCALATED: Agent stuck after trying:\n{override_prompt}"
return result
Expected behavior: Agent stops and re-evaluates strategy after 3 identical failed actions.
If it still loops:
- Problem: Action signatures too strict (minor param differences counted as unique)
- Solution: Use fuzzy matching or action type only for critical operations
Step 3: Add Result-Based Loop Detection
Sometimes actions look different but produce identical failures.
class AgentMemory:
# ... previous code ...
def detect_result_loop(self, window: int = 5) -> bool:
"""Detect when different actions yield same error"""
if len(self.history) < window:
return False
recent = self.history[-window:]
# Extract error patterns from results
error_patterns = []
for action in recent:
# Look for common error signatures
if "Error:" in action.result or "FAILED" in action.result:
# Normalize error (remove file paths, line numbers)
normalized = self._normalize_error(action.result)
error_patterns.append(normalized)
# If 80%+ of recent actions hit the same error, we're stuck
if len(error_patterns) >= window * 0.8:
unique_errors = len(set(error_patterns))
return unique_errors <= 2
return False
@staticmethod
def _normalize_error(error_text: str) -> str:
"""Strip variable parts of errors for comparison"""
import re
# Remove line numbers: "line 42" -> "line X"
normalized = re.sub(r'line \d+', 'line X', error_text)
# Remove file paths
normalized = re.sub(r'/[^\s]+\.py', 'FILE.py', normalized)
# Remove memory addresses
normalized = re.sub(r'0x[0-9a-f]+', '0xADDR', normalized)
return normalized[:200] # First 200 chars
Why this matters: Agent might edit different files or run different commands, but if they all produce ModuleNotFoundError, it needs to change its approach.
Step 4: Structured Reflection Prompts
When a loop is detected, force the agent to explicitly reason about alternatives.
def _generate_reflection_prompt(self, stuck_pattern: str) -> str:
"""Create prompt that breaks fixation"""
return f"""You've been working on this task for {len(self.memory.history)} steps.
Pattern detected: {stuck_pattern}
Before taking your next action, answer these questions:
1. What assumption might be wrong?
2. What haven't you tried yet?
3. Is there missing information you need from the user?
4. Should you simplify the goal or try a workaround?
Think through each, then decide your next action."""
Use this prompt when detect_loop() or detect_result_loop() triggers.
Verification
Test the loop detection:
# Create agent with low threshold for testing
agent = AutonomousCodingAgent(
api_key="your-key",
max_iterations=20
)
agent.memory.loop_threshold = 2 # Trigger after 2 repeats
# Task that commonly causes loops
result = agent.run_task(
"Fix the failing unit test in test_parser.py"
)
# Check if escalation happened
assert "ESCALATED" in result or "task_complete" in result
print(f"Agent completed in {len(agent.memory.history)} steps")
You should see: Either successful completion or escalation message, never 20+ iterations of the same action.
Production Safeguards
Global Circuit Breakers
@dataclass
class ExecutionLimits:
max_total_actions: int = 100
max_api_calls: int = 50
max_cost_usd: float = 5.0
max_duration_minutes: int = 30
class SafeAgentExecutor:
def __init__(self, limits: ExecutionLimits):
self.limits = limits
self.start_time = datetime.now()
self.total_cost = 0.0
def check_limits(self, agent: AutonomousCodingAgent) -> Optional[str]:
"""Return error message if any limit exceeded"""
# Time limit
elapsed = (datetime.now() - self.start_time).seconds / 60
if elapsed > self.limits.max_duration_minutes:
return f"Timeout: {elapsed:.1f} minutes elapsed"
# Action limit
if len(agent.memory.history) > self.limits.max_total_actions:
return f"Action limit: {len(agent.memory.history)} actions"
# Cost limit (estimate $0.003 per API call for Sonnet)
estimated_cost = len(agent.memory.history) * 0.003
if estimated_cost > self.limits.max_cost_usd:
return f"Cost limit: ~${estimated_cost:.2f} spent"
return None
Logging for Post-Mortem Analysis
import json
from pathlib import Path
def save_execution_trace(agent: AutonomousCodingAgent, task_id: str):
"""Save full execution history for debugging"""
trace = {
"task_id": task_id,
"total_actions": len(agent.memory.history),
"actions": [
{
"type": a.action_type,
"params": a.parameters,
"result_preview": a.result[:500],
"timestamp": a.timestamp.isoformat()
}
for a in agent.memory.history
]
}
trace_file = Path(f"agent_traces/{task_id}.json")
trace_file.parent.mkdir(exist_ok=True)
trace_file.write_text(json.dumps(trace, indent=2))
What You Learned
- AI agents need explicit action history to avoid loops
- Three detection patterns: action signature, result similarity, iteration count
- Production systems need circuit breakers and cost limits
Key insight: The orchestration layer must provide memory that the LLM lacks natively.
Limitations:
- This adds latency (history serialization per iteration)
- Won't catch semantic loops (agent tries "similar but different" approaches)
- Requires tuning threshold values per task type
Real-World Example: Loop in Production
Scenario: Agent debugging a failing TypeScript build
Without loop detection:
1. Run `npm run build` → Error: Cannot find module 'utils'
2. Add import statement to utils.ts
3. Run `npm run build` → Error: Cannot find module 'utils'
4. Check if utils.ts exists (it does)
5. Run `npm run build` → Error: Cannot find module 'utils'
6. Add different import syntax
7. Run `npm run build` → Error: Cannot find module 'utils'
... (continues for 30+ iterations)
With loop detection:
1. Run `npm run build` → Error: Cannot find module 'utils'
2. Add import statement to utils.ts
3. Run `npm run build` → Error: Cannot find module 'utils'
4. LOOP DETECTED → Reflection prompt triggered
5. Agent realizes: "I've tried editing imports twice with same error.
The issue might be tsconfig.json moduleResolution settings."
6. Check tsconfig.json → finds misconfiguration
7. Update moduleResolution to "Node16"
8. Run `npm run build` → Success
Result: 8 actions instead of 30+, actual problem solved.
Advanced: Semantic Loop Detection
For production systems, add vector similarity comparison:
from sentence_transformers import SentenceTransformer
import numpy as np
class SemanticLoopDetector:
def __init__(self):
# Lightweight embedding model
self.model = SentenceTransformer('all-MiniLM-L6-v2')
def are_actions_similar(
self,
action1: ActionRecord,
action2: ActionRecord,
threshold: float = 0.85
) -> bool:
"""Check if two actions are semantically similar"""
# Create text representation of actions
text1 = f"{action1.action_type} {action1.parameters} {action1.result[:200]}"
text2 = f"{action2.action_type} {action2.parameters} {action2.result[:200]}"
# Get embeddings
embeddings = self.model.encode([text1, text2])
# Cosine similarity
similarity = np.dot(embeddings[0], embeddings[1]) / (
np.linalg.norm(embeddings[0]) * np.linalg.norm(embeddings[1])
)
return similarity > threshold
This catches cases where the agent tries "fix import statement" vs "update import path" - different actions, same intent.
Tested with Claude Sonnet 4, GPT-4, Anthropic API 2026.01, Python 3.11+