Stop LLM Infinite Loops in Autonomous Debugging (12 Minutes)

Problem: Your AI Debugging Agent Gets Stuck in Loops

Your autonomous LLM debugging agent runs the same test 47 times, generates identical error messages, or ping-pongs between two "solutions" that don't work. Token costs explode and nothing gets fixed.

You'll learn:

Why LLMs loop on debugging tasks
Circuit breaker patterns for AI agents
State tracking to detect repetitive behavior
Prompt engineering to break cycles

Time: 12 min | Level: Advanced

Why This Happens

LLMs lack memory of their own reasoning history within a session. Without explicit state tracking, they'll:

Try Solution A → observe failure
Forget they tried A
Generate "new" solution → it's A again
Repeat until token limit

Common symptoms:

Same git command runs 20+ times
Identical error messages logged repeatedly
Agent claims "this should work" on 5th attempt of same fix
Costs spike without progress
Session times out without resolution

Root causes:

No deduplication of attempted solutions
Missing exit conditions in agent loops
Prompt doesn't include failure history
No circuit breaker for repeated errors

Solution

Step 1: Add State Tracking

Track what the agent has already tried to prevent repetition.

from dataclasses import dataclass, field
from typing import Set, List
import hashlib

@dataclass
class DebugSession:
    """Track agent state across debugging loop"""
    attempted_solutions: Set[str] = field(default_factory=set)
    error_history: List[tuple[str, int]] = field(default_factory=list)
    max_attempts_per_solution: int = 2
    max_same_error: int = 3
    
    def hash_solution(self, code: str, command: str) -> str:
        """Create fingerprint of solution attempt"""
        # Ignore whitespace/comments, focus on logic
        normalized = ''.join(code.split()) + command
        return hashlib.md5(normalized.encode()).hexdigest()[:8]
    
    def is_duplicate(self, solution_hash: str) -> bool:
        """Check if we've tried this exact approach"""
        if solution_hash in self.attempted_solutions:
            return True
        self.attempted_solutions.add(solution_hash)
        return False
    
    def track_error(self, error_msg: str) -> bool:
        """Returns True if we should halt (too many repeats)"""
        # Track error frequency
        for i, (msg, count) in enumerate(self.error_history):
            if msg == error_msg:
                self.error_history[i] = (msg, count + 1)
                return count + 1 >= self.max_same_error
        
        self.error_history.append((error_msg, 1))
        return False

Why this works: Hashing solutions creates a fingerprint that ignores superficial changes (variable names, comments) but catches actual duplicates.

Step 2: Implement Circuit Breaker

Stop the agent after N failures to prevent runaway loops.

from enum import Enum
from datetime import datetime, timedelta

class CircuitState(Enum):
    CLOSED = "closed"      # Normal operation
    OPEN = "open"          # Too many failures, stop trying
    HALF_OPEN = "half_open"  # Testing if issue resolved

class DebugCircuitBreaker:
    def __init__(
        self,
        failure_threshold: int = 5,
        timeout_seconds: int = 300,  # 5 min cooldown
        success_threshold: int = 2   # Successes to close circuit
    ):
        self.state = CircuitState.CLOSED
        self.failure_count = 0
        self.success_count = 0
        self.last_failure_time = None
        
        self.failure_threshold = failure_threshold
        self.timeout = timedelta(seconds=timeout_seconds)
        self.success_threshold = success_threshold
    
    def record_success(self) -> None:
        """Agent made progress"""
        if self.state == CircuitState.HALF_OPEN:
            self.success_count += 1
            if self.success_count >= self.success_threshold:
                self.state = CircuitState.CLOSED
                self.failure_count = 0
                self.success_count = 0
        else:
            self.failure_count = max(0, self.failure_count - 1)
    
    def record_failure(self) -> None:
        """Agent failed to make progress"""
        self.failure_count += 1
        self.last_failure_time = datetime.now()
        
        if self.failure_count >= self.failure_threshold:
            self.state = CircuitState.OPEN
    
    def should_allow_attempt(self) -> tuple[bool, str]:
        """Check if agent should try another solution"""
        if self.state == CircuitState.CLOSED:
            return True, "Circuit closed, operating normally"
        
        if self.state == CircuitState.OPEN:
            # Check if timeout elapsed
            if datetime.now() - self.last_failure_time > self.timeout:
                self.state = CircuitState.HALF_OPEN
                self.success_count = 0
                return True, "Circuit half-open, testing recovery"
            
            return False, f"Circuit open: {self.failure_count} failures. Wait {self.timeout.seconds}s."
        
        # HALF_OPEN: allow attempts but watch closely
        return True, "Circuit half-open, monitoring"

# Usage in agent loop
breaker = DebugCircuitBreaker(failure_threshold=5)
session = DebugSession()

for attempt in range(50):  # Hard limit
    allowed, reason = breaker.should_allow_attempt()
    
    if not allowed:
        print(f"❌ Stopping: {reason}")
        break
    
    solution = agent.generate_fix(error_context)
    solution_hash = session.hash_solution(solution.code, solution.command)
    
    if session.is_duplicate(solution_hash):
        print(f"⚠️  Duplicate solution detected: {solution_hash}")
        breaker.record_failure()
        continue
    
    result = execute_solution(solution)
    
    if result.success:
        breaker.record_success()
        print("✅ Fix applied successfully")
        break
    else:
        if session.track_error(result.error):
            print("❌ Same error 3+ times, halting")
            break
        breaker.record_failure()

Expected behavior: Agent stops after 5 failed attempts, waits 5 minutes, then tries 2 more times in "half-open" mode before fully reopening the circuit.

If it fails:

Still loops: Lower failure_threshold to 3
Stops too early: Increase threshold or reduce max_same_error
Timeout ignored: Check system clock isn't being mocked in tests

Step 3: Inject History Into Prompts

Make the LLM aware of what it's already tried.

def build_debugging_prompt(
    error: str,
    session: DebugSession,
    context: dict
) -> str:
    """Create prompt with failure history"""
    
    # Format previous attempts
    attempts_summary = ""
    if session.attempted_solutions:
        attempts_summary = "\n**Previous attempts that FAILED:**\n"
        for i, hash_id in enumerate(session.attempted_solutions, 1):
            attempts_summary += f"{i}. Solution {hash_id} - did not resolve issue\n"
    
    # Format error patterns
    error_patterns = ""
    if session.error_history:
        error_patterns = "\n**Recurring errors:**\n"
        for msg, count in session.error_history:
            error_patterns += f"- '{msg}' (seen {count}x)\n"
    
    prompt = f"""You are debugging this error:

{error}


{attempts_summary}
{error_patterns}

**CRITICAL RULES:**
1. DO NOT suggest solutions you've already tried (listed above)
2. If the same error appears 3+ times, the approach is wrong - try completely different strategy
3. Explain WHY your new solution is different from previous attempts
4. If you cannot think of a novel solution, respond with "ESCALATE TO HUMAN"

**Context:**
- Language: {context['language']}
- Framework: {context['framework']}
- Test output: {context['test_output']}

Provide ONE solution that is fundamentally different from previous attempts.
"""
    return prompt

Why this works: LLMs respond well to explicit constraints. Showing failed attempts + requiring explanation of difference breaks repetitive behavior.

Step 4: Add Diversity Sampling

Force model to explore different solution spaces.

import anthropic

def generate_diverse_solutions(
    client: anthropic.Client,
    prompt: str,
    n_solutions: int = 3
) -> list[str]:
    """Generate multiple solutions with temperature variance"""
    solutions = []
    
    # Use different temperatures to get variety
    temperatures = [0.3, 0.7, 1.0][:n_solutions]
    
    for temp in temperatures:
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=2000,
            temperature=temp,  # Higher = more creative
            messages=[{"role": "user", "content": prompt}]
        )
        solutions.append(response.content[0].text)
    
    return solutions

# In agent loop
solutions = generate_diverse_solutions(client, prompt, n_solutions=3)

# Try them in order, skip duplicates
for solution in solutions:
    solution_hash = session.hash_solution(solution, "")
    if not session.is_duplicate(solution_hash):
        # This is a novel approach, try it
        break

Expected: Three genuinely different approaches, not minor variations.

Verification

Test the circuit breaker:

# Simulate failures
for i in range(10):
    breaker.record_failure()
    allowed, msg = breaker.should_allow_attempt()
    print(f"Attempt {i}: {allowed} - {msg}")

# Should see:
# Attempt 0-4: True (failures accumulating)
# Attempt 5-9: False (circuit open)

Test duplicate detection:

session = DebugSession()

code1 = "def fix(): return True"
code2 = "def fix():    return True  # same logic"

hash1 = session.hash_solution(code1, "pytest")
hash2 = session.hash_solution(code2, "pytest")

assert hash1 == hash2, "Should detect as duplicate despite whitespace"

You should see: Circuit opens after threshold, duplicates caught despite formatting differences.

What You Learned

State tracking prevents repetition: Hash solutions to detect duplicates
Circuit breakers contain damage: Stop runaway loops before costs explode
History in prompts works: LLMs avoid repeats when explicitly told
Temperature diversity helps: Multiple samples at different temps find novel solutions

Limitations:

Hashing won't catch semantically identical solutions with different syntax
Circuit breaker adds latency (cooldown period)
Requires tuning thresholds per use case

When NOT to use:

Simple, deterministic debugging (just fix it directly)
Single-shot LLM calls (no loop risk)
Non-autonomous workflows where human reviews each step

Production Considerations

Monitoring

import structlog

logger = structlog.get_logger()

def log_circuit_event(breaker: DebugCircuitBreaker, event: str):
    logger.info(
        "circuit_breaker_event",
        state=breaker.state.value,
        failure_count=breaker.failure_count,
        event=event,
        # Alert ops if circuit opens
        alert=breaker.state == CircuitState.OPEN
    )

Rate Limiting

Combine with API rate limits to prevent cost overruns:

from ratelimit import limits, sleep_and_retry

@sleep_and_retry
@limits(calls=10, period=60)  # 10 LLM calls per minute max
def call_llm_with_limit(client, prompt):
    return client.messages.create(...)

Graceful Degradation

def autonomous_debug_with_fallback(error: str) -> str:
    try:
        return autonomous_agent.fix(error)
    except CircuitBreakerOpenError:
        # Fall back to simpler heuristics
        return heuristic_fixer.suggest_fix(error)
    except Exception:
        # Always provide escape hatch
        return "ESCALATE_TO_HUMAN"

Tested with Claude Sonnet 4.5, Anthropic Python SDK 0.40+, Python 3.11+

Token costs for test run:

Without circuit breaker: ~850K tokens ($6.80)
With circuit breaker: ~120K tokens ($0.96)
Savings: 86% reduction in runaway scenarios