Stop AI Agent Loops in Autonomous Coding Tasks

Problem: Your AI Agent Gets Stuck in Infinite Loops

Your autonomous coding agent runs the same failed test 47 times, edits the same file repeatedly, or keeps searching for solutions it already tried. You're burning API credits and getting no useful work done.

You'll learn:

Why AI agents loop on coding tasks
Three debugging patterns to detect loops early
Production-ready safeguards for autonomous systems

Time: 25 min | Level: Advanced

Why This Happens

AI agents lack implicit memory of their action history within a single task. Each decision uses context from the conversation, but the model doesn't inherently "know" it already tried something unless that information is explicitly in the prompt.

Common symptoms:

Agent runs pytest → sees failure → edits code → runs pytest with identical results
Searches documentation for the same query 5+ times
Applies the same fix pattern to different files expecting different outcomes
Burns through token limits or API rate limits

Root cause: No built-in loop detection between the orchestration layer (your code) and the LLM's reasoning layer.

Solution

Step 1: Implement Action History Tracking

Track what the agent has already attempted in a structured format.

from dataclasses import dataclass, field
from typing import List, Dict, Any
from datetime import datetime

@dataclass
class ActionRecord:
    """Track each action the agent takes"""
    action_type: str  # "run_test", "edit_file", "search_docs"
    parameters: Dict[str, Any]  # {"file": "app.py", "line": 42}
    result: str  # stdout/stderr or operation result
    timestamp: datetime = field(default_factory=datetime.now)
    
    def signature(self) -> str:
        """Create unique signature for duplicate detection"""
        # Sort params for consistent hashing
        param_str = str(sorted(self.parameters.items()))
        return f"{self.action_type}:{param_str}"

class AgentMemory:
    def __init__(self, loop_threshold: int = 3):
        self.history: List[ActionRecord] = []
        self.loop_threshold = loop_threshold
        
    def record(self, action: ActionRecord) -> None:
        self.history.append(action)
        
    def detect_loop(self) -> bool:
        """Check if agent is repeating actions"""
        if len(self.history) < self.loop_threshold:
            return False
            
        # Check last N actions for duplicates
        recent = self.history[-self.loop_threshold:]
        signatures = [a.signature() for a in recent]
        
        # Same action repeated threshold times = loop
        return len(set(signatures)) == 1
    
    def get_context_summary(self, last_n: int = 10) -> str:
        """Generate summary for LLM context"""
        recent = self.history[-last_n:]
        summary = "## Recent Actions\n"
        
        for i, action in enumerate(recent, 1):
            summary += f"{i}. {action.action_type}"
            if action.parameters:
                summary += f" ({action.parameters})"
            summary += f"\n   Result: {action.result[:100]}...\n"
            
        return summary

Why this works: Creates a feedback mechanism that surfaces patterns invisible to the LLM alone.

Step 2: Add Loop Detection to Your Orchestration Layer

Integrate memory into your agent's execution loop with circuit-breaker logic.

from typing import Optional
import anthropic

class AutonomousCodingAgent:
    def __init__(self, api_key: str, max_iterations: int = 50):
        self.client = anthropic.Anthropic(api_key=api_key)
        self.memory = AgentMemory(loop_threshold=3)
        self.max_iterations = max_iterations
        
    def run_task(self, task: str) -> str:
        """Execute coding task with loop protection"""
        iteration = 0
        
        while iteration < self.max_iterations:
            # Check for loops before calling LLM
            if self.memory.detect_loop():
                return self._handle_loop_detected()
            
            # Get next action from LLM
            action = self._get_next_action(task)
            
            if action["type"] == "task_complete":
                return action["result"]
            
            # Execute action and record
            result = self._execute_action(action)
            self.memory.record(ActionRecord(
                action_type=action["type"],
                parameters=action.get("params", {}),
                result=result
            ))
            
            iteration += 1
        
        return self._handle_max_iterations()
    
    def _get_next_action(self, task: str) -> Dict[str, Any]:
        """Query LLM with action history context"""
        system_prompt = f"""You are an autonomous coding agent.
        
{self.memory.get_context_summary()}

CRITICAL: Review your recent actions above. If you've tried the same 
approach multiple times with the same failure, you MUST try a different 
strategy or escalate to the user.

Available actions:
- run_test: Execute tests
- edit_file: Modify source code  
- search_docs: Query documentation
- task_complete: Finish when solved
- escalate: Ask user for help
"""
        
        response = self.client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=4000,
            system=system_prompt,
            messages=[{"role": "user", "content": task}]
        )
        
        # Parse action from response (simplified)
        return self._parse_action(response.content[0].text)
    
    def _handle_loop_detected(self) -> str:
        """Intervention when loop is detected"""
        last_actions = self.memory.history[-3:]
        
        # Give LLM one chance to break out with explicit warning
        override_prompt = f"""LOOP DETECTED. You just repeated this action 3 times:

{last_actions[0].action_type} with {last_actions[0].parameters}

Each attempt failed with: {last_actions[0].result}

You MUST either:
1. Try a completely different approach
2. Use 'escalate' action to ask the user for help

What will you do differently?"""

        response = self.client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=2000,
            messages=[{"role": "user", "content": override_prompt}]
        )
        
        # Execute one more action, then escalate if still looping
        action = self._parse_action(response.content[0].text)
        result = self._execute_action(action)
        
        if self.memory.detect_loop():
            return f"ESCALATED: Agent stuck after trying:\n{override_prompt}"
        
        return result

Expected behavior: Agent stops and re-evaluates strategy after 3 identical failed actions.

If it still loops:

Problem: Action signatures too strict (minor param differences counted as unique)
Solution: Use fuzzy matching or action type only for critical operations

Step 3: Add Result-Based Loop Detection

Sometimes actions look different but produce identical failures.

class AgentMemory:
    # ... previous code ...
    
    def detect_result_loop(self, window: int = 5) -> bool:
        """Detect when different actions yield same error"""
        if len(self.history) < window:
            return False
            
        recent = self.history[-window:]
        
        # Extract error patterns from results
        error_patterns = []
        for action in recent:
            # Look for common error signatures
            if "Error:" in action.result or "FAILED" in action.result:
                # Normalize error (remove file paths, line numbers)
                normalized = self._normalize_error(action.result)
                error_patterns.append(normalized)
        
        # If 80%+ of recent actions hit the same error, we're stuck
        if len(error_patterns) >= window * 0.8:
            unique_errors = len(set(error_patterns))
            return unique_errors <= 2
        
        return False
    
    @staticmethod
    def _normalize_error(error_text: str) -> str:
        """Strip variable parts of errors for comparison"""
        import re
        # Remove line numbers: "line 42" -> "line X"
        normalized = re.sub(r'line \d+', 'line X', error_text)
        # Remove file paths
        normalized = re.sub(r'/[^\s]+\.py', 'FILE.py', normalized)
        # Remove memory addresses
        normalized = re.sub(r'0x[0-9a-f]+', '0xADDR', normalized)
        return normalized[:200]  # First 200 chars

Why this matters: Agent might edit different files or run different commands, but if they all produce ModuleNotFoundError, it needs to change its approach.

Step 4: Structured Reflection Prompts

When a loop is detected, force the agent to explicitly reason about alternatives.

def _generate_reflection_prompt(self, stuck_pattern: str) -> str:
    """Create prompt that breaks fixation"""
    return f"""You've been working on this task for {len(self.memory.history)} steps.

Pattern detected: {stuck_pattern}

Before taking your next action, answer these questions:

1. What assumption might be wrong?
2. What haven't you tried yet?
3. Is there missing information you need from the user?
4. Should you simplify the goal or try a workaround?

Think through each, then decide your next action."""

Use this prompt when detect_loop() or detect_result_loop() triggers.

Verification

Test the loop detection:

# Create agent with low threshold for testing
agent = AutonomousCodingAgent(
    api_key="your-key",
    max_iterations=20
)
agent.memory.loop_threshold = 2  # Trigger after 2 repeats

# Task that commonly causes loops
result = agent.run_task(
    "Fix the failing unit test in test_parser.py"
)

# Check if escalation happened
assert "ESCALATED" in result or "task_complete" in result
print(f"Agent completed in {len(agent.memory.history)} steps")

You should see: Either successful completion or escalation message, never 20+ iterations of the same action.

Production Safeguards

Global Circuit Breakers

@dataclass
class ExecutionLimits:
    max_total_actions: int = 100
    max_api_calls: int = 50
    max_cost_usd: float = 5.0
    max_duration_minutes: int = 30
    
class SafeAgentExecutor:
    def __init__(self, limits: ExecutionLimits):
        self.limits = limits
        self.start_time = datetime.now()
        self.total_cost = 0.0
        
    def check_limits(self, agent: AutonomousCodingAgent) -> Optional[str]:
        """Return error message if any limit exceeded"""
        
        # Time limit
        elapsed = (datetime.now() - self.start_time).seconds / 60
        if elapsed > self.limits.max_duration_minutes:
            return f"Timeout: {elapsed:.1f} minutes elapsed"
        
        # Action limit
        if len(agent.memory.history) > self.limits.max_total_actions:
            return f"Action limit: {len(agent.memory.history)} actions"
        
        # Cost limit (estimate $0.003 per API call for Sonnet)
        estimated_cost = len(agent.memory.history) * 0.003
        if estimated_cost > self.limits.max_cost_usd:
            return f"Cost limit: ~${estimated_cost:.2f} spent"
        
        return None

Logging for Post-Mortem Analysis

import json
from pathlib import Path

def save_execution_trace(agent: AutonomousCodingAgent, task_id: str):
    """Save full execution history for debugging"""
    trace = {
        "task_id": task_id,
        "total_actions": len(agent.memory.history),
        "actions": [
            {
                "type": a.action_type,
                "params": a.parameters,
                "result_preview": a.result[:500],
                "timestamp": a.timestamp.isoformat()
            }
            for a in agent.memory.history
        ]
    }
    
    trace_file = Path(f"agent_traces/{task_id}.json")
    trace_file.parent.mkdir(exist_ok=True)
    trace_file.write_text(json.dumps(trace, indent=2))

What You Learned

AI agents need explicit action history to avoid loops
Three detection patterns: action signature, result similarity, iteration count
Production systems need circuit breakers and cost limits

Key insight: The orchestration layer must provide memory that the LLM lacks natively.

Limitations:

This adds latency (history serialization per iteration)
Won't catch semantic loops (agent tries "similar but different" approaches)
Requires tuning threshold values per task type

Real-World Example: Loop in Production

Scenario: Agent debugging a failing TypeScript build

Without loop detection:

1. Run `npm run build` → Error: Cannot find module 'utils'
2. Add import statement to utils.ts
3. Run `npm run build` → Error: Cannot find module 'utils'
4. Check if utils.ts exists (it does)
5. Run `npm run build` → Error: Cannot find module 'utils'
6. Add different import syntax
7. Run `npm run build` → Error: Cannot find module 'utils'
... (continues for 30+ iterations)

With loop detection:

1. Run `npm run build` → Error: Cannot find module 'utils'
2. Add import statement to utils.ts  
3. Run `npm run build` → Error: Cannot find module 'utils'
4. LOOP DETECTED → Reflection prompt triggered
5. Agent realizes: "I've tried editing imports twice with same error. 
   The issue might be tsconfig.json moduleResolution settings."
6. Check tsconfig.json → finds misconfiguration
7. Update moduleResolution to "Node16"
8. Run `npm run build` → Success

Result: 8 actions instead of 30+, actual problem solved.

Advanced: Semantic Loop Detection

For production systems, add vector similarity comparison:

from sentence_transformers import SentenceTransformer
import numpy as np

class SemanticLoopDetector:
    def __init__(self):
        # Lightweight embedding model
        self.model = SentenceTransformer('all-MiniLM-L6-v2')
        
    def are_actions_similar(
        self, 
        action1: ActionRecord, 
        action2: ActionRecord,
        threshold: float = 0.85
    ) -> bool:
        """Check if two actions are semantically similar"""
        
        # Create text representation of actions
        text1 = f"{action1.action_type} {action1.parameters} {action1.result[:200]}"
        text2 = f"{action2.action_type} {action2.parameters} {action2.result[:200]}"
        
        # Get embeddings
        embeddings = self.model.encode([text1, text2])
        
        # Cosine similarity
        similarity = np.dot(embeddings[0], embeddings[1]) / (
            np.linalg.norm(embeddings[0]) * np.linalg.norm(embeddings[1])
        )
        
        return similarity > threshold

This catches cases where the agent tries "fix import statement" vs "update import path" - different actions, same intent.

Tested with Claude Sonnet 4, GPT-4, Anthropic API 2026.01, Python 3.11+