Problem: Your AI Agent Does Things You Didn't Expect

You deployed an AI agent. It completed tasks autonomously — and some of those tasks caused real damage. It deleted the wrong records, sent an email to the wrong list, or made an API call you'd have vetoed if you'd seen it first.

Fully autonomous agents are powerful. They're also dangerous without the right guardrails. Human-in-the-Loop (HITL) architecture gives you a systematic way to keep humans in control without defeating the purpose of automation.

You'll learn:

Where to place human checkpoints in an agent pipeline
How to implement approval gates with async confirmation
How to design graceful fallbacks when humans don't respond

Time: 20 min | Level: Intermediate

Why This Happens

Most agent failures trace back to one root cause: the agent was given too much autonomy over irreversible actions.

LLM agents are optimistic planners. They generate plausible-looking action sequences and execute them. Without checkpoints, an agent that misunderstands intent will confidently complete the wrong task — sometimes in ways you can't undo.

Common symptoms:

Agent takes correct-looking steps toward the wrong goal
Side effects propagate across systems before anyone notices
Logs show successful execution; users see unexpected outcomes

The fix isn't to make the agent dumber — it's to define where human judgment is required and how to request it efficiently.

Core Concepts

The HITL Decision Matrix

Not every action needs human approval. Classify actions on two axes: reversibility and blast radius.

                HIGH IMPACT
                     │
     Require         │      Require
     approval        │      approval
     (async ok)      │      (sync required)
                     │
LOW ─────────────────┼───────────────── HIGH
REVERSIBLE           │                 IRREVERSIBLE
                     │
     Run             │      Require
     autonomously    │      approval
                     │      (log + notify)
                LOW IMPACT

Use this matrix to annotate your tool definitions before you write a single line of agent code.

Three HITL Patterns

1. Checkpoint Gates — Agent pauses, presents a plan, waits for approval before proceeding.

2. Shadow Mode — Agent executes, but writes to a staging target. Human reviews and promotes.

3. Confidence Thresholds — Agent acts autonomously above a confidence score; escalates below it.

Most production systems combine all three.

Solution

Step 1: Annotate Tools with Risk Metadata

Start by adding risk metadata to every tool your agent can call. This drives routing logic downstream.

type RiskLevel = "low" | "medium" | "high" | "critical";

interface AgentTool {
  name: string;
  description: string;
  risk: {
    level: RiskLevel;
    reversible: boolean;
    blastRadius: "local" | "user" | "org" | "external";
  };
  execute: (params: unknown) => Promise<unknown>;
}

const sendEmailTool: AgentTool = {
  name: "send_email",
  description: "Send an email to one or more recipients",
  risk: {
    level: "high",
    reversible: false,      // Can't unsend
    blastRadius: "external" // Leaves your system
  },
  execute: async (params) => { /* ... */ }
};

const readDatabaseTool: AgentTool = {
  name: "read_database",
  description: "Query records from the database",
  risk: {
    level: "low",
    reversible: true,
    blastRadius: "local"
  },
  execute: async (params) => { /* ... */ }
};

Expected: Every tool in your registry has risk metadata before the agent can access it.

If it fails:

"Too many tools need approval": Re-examine blast radius. Read operations almost never need approval.
"Risk levels are inconsistent": Define levels in a team doc. Subjective classification causes inconsistency.

Step 2: Build the Approval Gate

The approval gate intercepts high-risk tool calls, suspends the agent, and waits for a human decision.

interface ApprovalRequest {
  id: string;
  agentRunId: string;
  tool: string;
  params: unknown;
  reasoning: string;     // Agent explains why it's calling this tool
  expiresAt: Date;
  status: "pending" | "approved" | "rejected" | "expired";
}

class ApprovalGate {
  private store: Map<string, ApprovalRequest> = new Map();
  private timeoutMs: number;

  constructor(timeoutMs = 30 * 60 * 1000) { // 30 min default
    this.timeoutMs = timeoutMs;
  }

  async requestApproval(
    agentRunId: string,
    tool: AgentTool,
    params: unknown,
    reasoning: string
  ): Promise<"approved" | "rejected" | "expired"> {
    const request: ApprovalRequest = {
      id: crypto.randomUUID(),
      agentRunId,
      tool: tool.name,
      params,
      reasoning,
      expiresAt: new Date(Date.now() + this.timeoutMs),
      status: "pending"
    };

    this.store.set(request.id, request);
    await this.notifyReviewer(request); // Email, Slack, webhook — your choice

    // Poll for decision
    return this.waitForDecision(request.id);
  }

  private async waitForDecision(
    requestId: string
  ): Promise<"approved" | "rejected" | "expired"> {
    const pollInterval = 5000; // 5 seconds

    while (true) {
      const request = this.store.get(requestId)!;

      if (request.status !== "pending") {
        return request.status as "approved" | "rejected";
      }

      if (new Date() > request.expiresAt) {
        request.status = "expired";
        return "expired";
      }

      await new Promise(r => setTimeout(r, pollInterval));
    }
  }

  // Called by your approval UI / webhook
  resolve(requestId: string, decision: "approved" | "rejected") {
    const request = this.store.get(requestId);
    if (request && request.status === "pending") {
      request.status = decision;
    }
  }

  private async notifyReviewer(request: ApprovalRequest) {
    // Implement: Slack message, email, push notification
    console.log(`[APPROVAL NEEDED] ${request.tool}`, request);
  }
}

Why the polling loop: Webhooks are simpler but require your approval UI to call back into the agent runtime. Polling works across environments with no inbound routing required. For production, replace with a message queue.

Step 3: Wire the Gate into Your Agent Executor

Intercept tool calls in your executor before they reach the tool itself.

class HITLAgentExecutor {
  private gate: ApprovalGate;
  private tools: Map<string, AgentTool>;

  constructor(tools: AgentTool[], gate: ApprovalGate) {
    this.gate = gate;
    this.tools = new Map(tools.map(t => [t.name, t]));
  }

  async executeTool(
    agentRunId: string,
    toolName: string,
    params: unknown,
    agentReasoning: string
  ): Promise<unknown> {
    const tool = this.tools.get(toolName);
    if (!tool) throw new Error(`Unknown tool: ${toolName}`);

    // Low-risk: execute immediately
    if (tool.risk.level === "low") {
      return tool.execute(params);
    }

    // High-risk: gate it
    const decision = await this.gate.requestApproval(
      agentRunId,
      tool,
      params,
      agentReasoning
    );

    if (decision === "approved") {
      return tool.execute(params);
    }

    if (decision === "rejected") {
      // Return structured rejection — agent can try an alternative approach
      return { error: "TOOL_REJECTED", message: "Human reviewer rejected this action." };
    }

    // Expired — human didn't respond in time
    return { error: "APPROVAL_TIMEOUT", message: "No response from reviewer. Action skipped." };
  }
}

Expected: High-risk tool calls pause the agent. Low-risk calls execute without delay.

If it fails:

"Agent loops on rejection": Add a max-retry limit per tool per run. After 2 rejections, abort and surface the issue to the user.
"Timeouts cause lost work": Persist agent state before entering the approval gate. Resume from the checkpoint on approval.

Step 4: Add Confidence-Based Escalation

For agents using LLM reasoning, add a confidence check that escalates ambiguous cases before tool execution even begins.

interface AgentStep {
  thought: string;
  tool: string;
  params: unknown;
  confidence: number; // 0.0 - 1.0, self-reported by the LLM
}

const CONFIDENCE_THRESHOLD = 0.8;

async function maybeEscalate(
  step: AgentStep,
  executor: HITLAgentExecutor,
  agentRunId: string
): Promise<unknown> {
  if (step.confidence < CONFIDENCE_THRESHOLD) {
    // Low confidence — ask human before even attempting
    const decision = await executor["gate"].requestApproval(
      agentRunId,
      executor["tools"].get(step.tool)!,
      step.params,
      `Low confidence (${step.confidence}): ${step.thought}`
    );

    if (decision !== "approved") {
      return { error: "ESCALATED_NOT_APPROVED" };
    }
  }

  return executor.executeTool(agentRunId, step.tool, step.params, step.thought);
}

Why self-reported confidence works: LLMs calibrate reasonably well when explicitly prompted to report uncertainty. Add this to your system prompt: "Before each action, rate your confidence from 0.0 to 1.0. If below 0.8, explain why."

Verification

Deploy a test agent with a high-risk tool and verify the gate fires correctly:

# Unit test the approval gate
npx jest --testPathPattern approval-gate

# Integration test: confirm gate intercepts high-risk calls
npx ts-node scripts/test-hitl-integration.ts

You should see:

[APPROVAL NEEDED] send_email { to: "test@example.com", subject: "Test" }
Gate status: pending
[Manual: resolve as approved]
Gate status: approved
Tool executed: send_email ✓

Run the same scenario with resolve("rejected") — the tool should not execute, and the agent should receive the structured rejection response.

What You Learned

Classify tools by reversibility and blast radius — not by gut feeling
Approval gates should suspend the agent, not abort it — let it handle rejection gracefully
Confidence thresholds catch ambiguous cases before they become high-risk calls
Polling is simpler than webhooks for approval flows in early-stage systems

Limitations to know:

HITL adds latency. For time-sensitive agents, set aggressive timeouts and clear fallback behavior.
Self-reported LLM confidence is useful but not perfectly calibrated. Tune thresholds against real runs.
This pattern doesn't replace observability. Log every tool call, decision, and outcome — HITL is a safety layer, not a substitute for monitoring.

When NOT to use this:

Fully read-only agents with no external side effects
Real-time systems where human latency is unacceptable (use shadow mode instead)

Tested with Node.js 22.x, TypeScript 5.7, OpenAI SDK 4.x, Anthropic SDK 0.39.x