Problem: Your AI Debugging Agent Gets Stuck in Loops
Your autonomous LLM debugging agent runs the same test 47 times, generates identical error messages, or ping-pongs between two "solutions" that don't work. Token costs explode and nothing gets fixed.
You'll learn:
- Why LLMs loop on debugging tasks
- Circuit breaker patterns for AI agents
- State tracking to detect repetitive behavior
- Prompt engineering to break cycles
Time: 12 min | Level: Advanced
Why This Happens
LLMs lack memory of their own reasoning history within a session. Without explicit state tracking, they'll:
- Try Solution A → observe failure
- Forget they tried A
- Generate "new" solution → it's A again
- Repeat until token limit
Common symptoms:
- Same git command runs 20+ times
- Identical error messages logged repeatedly
- Agent claims "this should work" on 5th attempt of same fix
- Costs spike without progress
- Session times out without resolution
Root causes:
- No deduplication of attempted solutions
- Missing exit conditions in agent loops
- Prompt doesn't include failure history
- No circuit breaker for repeated errors
Solution
Step 1: Add State Tracking
Track what the agent has already tried to prevent repetition.
from dataclasses import dataclass, field
from typing import Set, List
import hashlib
@dataclass
class DebugSession:
"""Track agent state across debugging loop"""
attempted_solutions: Set[str] = field(default_factory=set)
error_history: List[tuple[str, int]] = field(default_factory=list)
max_attempts_per_solution: int = 2
max_same_error: int = 3
def hash_solution(self, code: str, command: str) -> str:
"""Create fingerprint of solution attempt"""
# Ignore whitespace/comments, focus on logic
normalized = ''.join(code.split()) + command
return hashlib.md5(normalized.encode()).hexdigest()[:8]
def is_duplicate(self, solution_hash: str) -> bool:
"""Check if we've tried this exact approach"""
if solution_hash in self.attempted_solutions:
return True
self.attempted_solutions.add(solution_hash)
return False
def track_error(self, error_msg: str) -> bool:
"""Returns True if we should halt (too many repeats)"""
# Track error frequency
for i, (msg, count) in enumerate(self.error_history):
if msg == error_msg:
self.error_history[i] = (msg, count + 1)
return count + 1 >= self.max_same_error
self.error_history.append((error_msg, 1))
return False
Why this works: Hashing solutions creates a fingerprint that ignores superficial changes (variable names, comments) but catches actual duplicates.
Step 2: Implement Circuit Breaker
Stop the agent after N failures to prevent runaway loops.
from enum import Enum
from datetime import datetime, timedelta
class CircuitState(Enum):
CLOSED = "closed" # Normal operation
OPEN = "open" # Too many failures, stop trying
HALF_OPEN = "half_open" # Testing if issue resolved
class DebugCircuitBreaker:
def __init__(
self,
failure_threshold: int = 5,
timeout_seconds: int = 300, # 5 min cooldown
success_threshold: int = 2 # Successes to close circuit
):
self.state = CircuitState.CLOSED
self.failure_count = 0
self.success_count = 0
self.last_failure_time = None
self.failure_threshold = failure_threshold
self.timeout = timedelta(seconds=timeout_seconds)
self.success_threshold = success_threshold
def record_success(self) -> None:
"""Agent made progress"""
if self.state == CircuitState.HALF_OPEN:
self.success_count += 1
if self.success_count >= self.success_threshold:
self.state = CircuitState.CLOSED
self.failure_count = 0
self.success_count = 0
else:
self.failure_count = max(0, self.failure_count - 1)
def record_failure(self) -> None:
"""Agent failed to make progress"""
self.failure_count += 1
self.last_failure_time = datetime.now()
if self.failure_count >= self.failure_threshold:
self.state = CircuitState.OPEN
def should_allow_attempt(self) -> tuple[bool, str]:
"""Check if agent should try another solution"""
if self.state == CircuitState.CLOSED:
return True, "Circuit closed, operating normally"
if self.state == CircuitState.OPEN:
# Check if timeout elapsed
if datetime.now() - self.last_failure_time > self.timeout:
self.state = CircuitState.HALF_OPEN
self.success_count = 0
return True, "Circuit half-open, testing recovery"
return False, f"Circuit open: {self.failure_count} failures. Wait {self.timeout.seconds}s."
# HALF_OPEN: allow attempts but watch closely
return True, "Circuit half-open, monitoring"
# Usage in agent loop
breaker = DebugCircuitBreaker(failure_threshold=5)
session = DebugSession()
for attempt in range(50): # Hard limit
allowed, reason = breaker.should_allow_attempt()
if not allowed:
print(f"❌ Stopping: {reason}")
break
solution = agent.generate_fix(error_context)
solution_hash = session.hash_solution(solution.code, solution.command)
if session.is_duplicate(solution_hash):
print(f"⚠️ Duplicate solution detected: {solution_hash}")
breaker.record_failure()
continue
result = execute_solution(solution)
if result.success:
breaker.record_success()
print("✅ Fix applied successfully")
break
else:
if session.track_error(result.error):
print("❌ Same error 3+ times, halting")
break
breaker.record_failure()
Expected behavior: Agent stops after 5 failed attempts, waits 5 minutes, then tries 2 more times in "half-open" mode before fully reopening the circuit.
If it fails:
- Still loops: Lower
failure_thresholdto 3 - Stops too early: Increase threshold or reduce
max_same_error - Timeout ignored: Check system clock isn't being mocked in tests
Step 3: Inject History Into Prompts
Make the LLM aware of what it's already tried.
def build_debugging_prompt(
error: str,
session: DebugSession,
context: dict
) -> str:
"""Create prompt with failure history"""
# Format previous attempts
attempts_summary = ""
if session.attempted_solutions:
attempts_summary = "\n**Previous attempts that FAILED:**\n"
for i, hash_id in enumerate(session.attempted_solutions, 1):
attempts_summary += f"{i}. Solution {hash_id} - did not resolve issue\n"
# Format error patterns
error_patterns = ""
if session.error_history:
error_patterns = "\n**Recurring errors:**\n"
for msg, count in session.error_history:
error_patterns += f"- '{msg}' (seen {count}x)\n"
prompt = f"""You are debugging this error:
{error}
{attempts_summary}
{error_patterns}
**CRITICAL RULES:**
1. DO NOT suggest solutions you've already tried (listed above)
2. If the same error appears 3+ times, the approach is wrong - try completely different strategy
3. Explain WHY your new solution is different from previous attempts
4. If you cannot think of a novel solution, respond with "ESCALATE TO HUMAN"
**Context:**
- Language: {context['language']}
- Framework: {context['framework']}
- Test output: {context['test_output']}
Provide ONE solution that is fundamentally different from previous attempts.
"""
return prompt
Why this works: LLMs respond well to explicit constraints. Showing failed attempts + requiring explanation of difference breaks repetitive behavior.
Step 4: Add Diversity Sampling
Force model to explore different solution spaces.
import anthropic
def generate_diverse_solutions(
client: anthropic.Client,
prompt: str,
n_solutions: int = 3
) -> list[str]:
"""Generate multiple solutions with temperature variance"""
solutions = []
# Use different temperatures to get variety
temperatures = [0.3, 0.7, 1.0][:n_solutions]
for temp in temperatures:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2000,
temperature=temp, # Higher = more creative
messages=[{"role": "user", "content": prompt}]
)
solutions.append(response.content[0].text)
return solutions
# In agent loop
solutions = generate_diverse_solutions(client, prompt, n_solutions=3)
# Try them in order, skip duplicates
for solution in solutions:
solution_hash = session.hash_solution(solution, "")
if not session.is_duplicate(solution_hash):
# This is a novel approach, try it
break
Expected: Three genuinely different approaches, not minor variations.
Verification
Test the circuit breaker:
# Simulate failures
for i in range(10):
breaker.record_failure()
allowed, msg = breaker.should_allow_attempt()
print(f"Attempt {i}: {allowed} - {msg}")
# Should see:
# Attempt 0-4: True (failures accumulating)
# Attempt 5-9: False (circuit open)
Test duplicate detection:
session = DebugSession()
code1 = "def fix(): return True"
code2 = "def fix(): return True # same logic"
hash1 = session.hash_solution(code1, "pytest")
hash2 = session.hash_solution(code2, "pytest")
assert hash1 == hash2, "Should detect as duplicate despite whitespace"
You should see: Circuit opens after threshold, duplicates caught despite formatting differences.
What You Learned
- State tracking prevents repetition: Hash solutions to detect duplicates
- Circuit breakers contain damage: Stop runaway loops before costs explode
- History in prompts works: LLMs avoid repeats when explicitly told
- Temperature diversity helps: Multiple samples at different temps find novel solutions
Limitations:
- Hashing won't catch semantically identical solutions with different syntax
- Circuit breaker adds latency (cooldown period)
- Requires tuning thresholds per use case
When NOT to use:
- Simple, deterministic debugging (just fix it directly)
- Single-shot LLM calls (no loop risk)
- Non-autonomous workflows where human reviews each step
Production Considerations
Monitoring
import structlog
logger = structlog.get_logger()
def log_circuit_event(breaker: DebugCircuitBreaker, event: str):
logger.info(
"circuit_breaker_event",
state=breaker.state.value,
failure_count=breaker.failure_count,
event=event,
# Alert ops if circuit opens
alert=breaker.state == CircuitState.OPEN
)
Rate Limiting
Combine with API rate limits to prevent cost overruns:
from ratelimit import limits, sleep_and_retry
@sleep_and_retry
@limits(calls=10, period=60) # 10 LLM calls per minute max
def call_llm_with_limit(client, prompt):
return client.messages.create(...)
Graceful Degradation
def autonomous_debug_with_fallback(error: str) -> str:
try:
return autonomous_agent.fix(error)
except CircuitBreakerOpenError:
# Fall back to simpler heuristics
return heuristic_fixer.suggest_fix(error)
except Exception:
# Always provide escape hatch
return "ESCALATE_TO_HUMAN"
Tested with Claude Sonnet 4.5, Anthropic Python SDK 0.40+, Python 3.11+
Token costs for test run:
- Without circuit breaker: ~850K tokens ($6.80)
- With circuit breaker: ~120K tokens ($0.96)
- Savings: 86% reduction in runaway scenarios