Problem: Your AI Agent Does Things You Didn't Expect
You deployed an AI agent. It completed tasks autonomously — and some of those tasks caused real damage. It deleted the wrong records, sent an email to the wrong list, or made an API call you'd have vetoed if you'd seen it first.
Fully autonomous agents are powerful. They're also dangerous without the right guardrails. Human-in-the-Loop (HITL) architecture gives you a systematic way to keep humans in control without defeating the purpose of automation.
You'll learn:
- Where to place human checkpoints in an agent pipeline
- How to implement approval gates with async confirmation
- How to design graceful fallbacks when humans don't respond
Time: 20 min | Level: Intermediate
Why This Happens
Most agent failures trace back to one root cause: the agent was given too much autonomy over irreversible actions.
LLM agents are optimistic planners. They generate plausible-looking action sequences and execute them. Without checkpoints, an agent that misunderstands intent will confidently complete the wrong task — sometimes in ways you can't undo.
Common symptoms:
- Agent takes correct-looking steps toward the wrong goal
- Side effects propagate across systems before anyone notices
- Logs show successful execution; users see unexpected outcomes
The fix isn't to make the agent dumber — it's to define where human judgment is required and how to request it efficiently.
Core Concepts
The HITL Decision Matrix
Not every action needs human approval. Classify actions on two axes: reversibility and blast radius.
HIGH IMPACT
│
Require │ Require
approval │ approval
(async ok) │ (sync required)
│
LOW ─────────────────┼───────────────── HIGH
REVERSIBLE │ IRREVERSIBLE
│
Run │ Require
autonomously │ approval
│ (log + notify)
LOW IMPACT
Use this matrix to annotate your tool definitions before you write a single line of agent code.
Three HITL Patterns
1. Checkpoint Gates — Agent pauses, presents a plan, waits for approval before proceeding.
2. Shadow Mode — Agent executes, but writes to a staging target. Human reviews and promotes.
3. Confidence Thresholds — Agent acts autonomously above a confidence score; escalates below it.
Most production systems combine all three.
Solution
Step 1: Annotate Tools with Risk Metadata
Start by adding risk metadata to every tool your agent can call. This drives routing logic downstream.
type RiskLevel = "low" | "medium" | "high" | "critical";
interface AgentTool {
name: string;
description: string;
risk: {
level: RiskLevel;
reversible: boolean;
blastRadius: "local" | "user" | "org" | "external";
};
execute: (params: unknown) => Promise<unknown>;
}
const sendEmailTool: AgentTool = {
name: "send_email",
description: "Send an email to one or more recipients",
risk: {
level: "high",
reversible: false, // Can't unsend
blastRadius: "external" // Leaves your system
},
execute: async (params) => { /* ... */ }
};
const readDatabaseTool: AgentTool = {
name: "read_database",
description: "Query records from the database",
risk: {
level: "low",
reversible: true,
blastRadius: "local"
},
execute: async (params) => { /* ... */ }
};
Expected: Every tool in your registry has risk metadata before the agent can access it.
If it fails:
- "Too many tools need approval": Re-examine blast radius. Read operations almost never need approval.
- "Risk levels are inconsistent": Define levels in a team doc. Subjective classification causes inconsistency.
Step 2: Build the Approval Gate
The approval gate intercepts high-risk tool calls, suspends the agent, and waits for a human decision.
interface ApprovalRequest {
id: string;
agentRunId: string;
tool: string;
params: unknown;
reasoning: string; // Agent explains why it's calling this tool
expiresAt: Date;
status: "pending" | "approved" | "rejected" | "expired";
}
class ApprovalGate {
private store: Map<string, ApprovalRequest> = new Map();
private timeoutMs: number;
constructor(timeoutMs = 30 * 60 * 1000) { // 30 min default
this.timeoutMs = timeoutMs;
}
async requestApproval(
agentRunId: string,
tool: AgentTool,
params: unknown,
reasoning: string
): Promise<"approved" | "rejected" | "expired"> {
const request: ApprovalRequest = {
id: crypto.randomUUID(),
agentRunId,
tool: tool.name,
params,
reasoning,
expiresAt: new Date(Date.now() + this.timeoutMs),
status: "pending"
};
this.store.set(request.id, request);
await this.notifyReviewer(request); // Email, Slack, webhook — your choice
// Poll for decision
return this.waitForDecision(request.id);
}
private async waitForDecision(
requestId: string
): Promise<"approved" | "rejected" | "expired"> {
const pollInterval = 5000; // 5 seconds
while (true) {
const request = this.store.get(requestId)!;
if (request.status !== "pending") {
return request.status as "approved" | "rejected";
}
if (new Date() > request.expiresAt) {
request.status = "expired";
return "expired";
}
await new Promise(r => setTimeout(r, pollInterval));
}
}
// Called by your approval UI / webhook
resolve(requestId: string, decision: "approved" | "rejected") {
const request = this.store.get(requestId);
if (request && request.status === "pending") {
request.status = decision;
}
}
private async notifyReviewer(request: ApprovalRequest) {
// Implement: Slack message, email, push notification
console.log(`[APPROVAL NEEDED] ${request.tool}`, request);
}
}
Why the polling loop: Webhooks are simpler but require your approval UI to call back into the agent runtime. Polling works across environments with no inbound routing required. For production, replace with a message queue.
Step 3: Wire the Gate into Your Agent Executor
Intercept tool calls in your executor before they reach the tool itself.
class HITLAgentExecutor {
private gate: ApprovalGate;
private tools: Map<string, AgentTool>;
constructor(tools: AgentTool[], gate: ApprovalGate) {
this.gate = gate;
this.tools = new Map(tools.map(t => [t.name, t]));
}
async executeTool(
agentRunId: string,
toolName: string,
params: unknown,
agentReasoning: string
): Promise<unknown> {
const tool = this.tools.get(toolName);
if (!tool) throw new Error(`Unknown tool: ${toolName}`);
// Low-risk: execute immediately
if (tool.risk.level === "low") {
return tool.execute(params);
}
// High-risk: gate it
const decision = await this.gate.requestApproval(
agentRunId,
tool,
params,
agentReasoning
);
if (decision === "approved") {
return tool.execute(params);
}
if (decision === "rejected") {
// Return structured rejection — agent can try an alternative approach
return { error: "TOOL_REJECTED", message: "Human reviewer rejected this action." };
}
// Expired — human didn't respond in time
return { error: "APPROVAL_TIMEOUT", message: "No response from reviewer. Action skipped." };
}
}
Expected: High-risk tool calls pause the agent. Low-risk calls execute without delay.
If it fails:
- "Agent loops on rejection": Add a max-retry limit per tool per run. After 2 rejections, abort and surface the issue to the user.
- "Timeouts cause lost work": Persist agent state before entering the approval gate. Resume from the checkpoint on approval.
Step 4: Add Confidence-Based Escalation
For agents using LLM reasoning, add a confidence check that escalates ambiguous cases before tool execution even begins.
interface AgentStep {
thought: string;
tool: string;
params: unknown;
confidence: number; // 0.0 - 1.0, self-reported by the LLM
}
const CONFIDENCE_THRESHOLD = 0.8;
async function maybeEscalate(
step: AgentStep,
executor: HITLAgentExecutor,
agentRunId: string
): Promise<unknown> {
if (step.confidence < CONFIDENCE_THRESHOLD) {
// Low confidence — ask human before even attempting
const decision = await executor["gate"].requestApproval(
agentRunId,
executor["tools"].get(step.tool)!,
step.params,
`Low confidence (${step.confidence}): ${step.thought}`
);
if (decision !== "approved") {
return { error: "ESCALATED_NOT_APPROVED" };
}
}
return executor.executeTool(agentRunId, step.tool, step.params, step.thought);
}
Why self-reported confidence works: LLMs calibrate reasonably well when explicitly prompted to report uncertainty. Add this to your system prompt: "Before each action, rate your confidence from 0.0 to 1.0. If below 0.8, explain why."
Verification
Deploy a test agent with a high-risk tool and verify the gate fires correctly:
# Unit test the approval gate
npx jest --testPathPattern approval-gate
# Integration test: confirm gate intercepts high-risk calls
npx ts-node scripts/test-hitl-integration.ts
You should see:
[APPROVAL NEEDED] send_email { to: "test@example.com", subject: "Test" }
Gate status: pending
[Manual: resolve as approved]
Gate status: approved
Tool executed: send_email ✓
Run the same scenario with resolve("rejected") — the tool should not execute, and the agent should receive the structured rejection response.
What You Learned
- Classify tools by reversibility and blast radius — not by gut feeling
- Approval gates should suspend the agent, not abort it — let it handle rejection gracefully
- Confidence thresholds catch ambiguous cases before they become high-risk calls
- Polling is simpler than webhooks for approval flows in early-stage systems
Limitations to know:
- HITL adds latency. For time-sensitive agents, set aggressive timeouts and clear fallback behavior.
- Self-reported LLM confidence is useful but not perfectly calibrated. Tune thresholds against real runs.
- This pattern doesn't replace observability. Log every tool call, decision, and outcome — HITL is a safety layer, not a substitute for monitoring.
When NOT to use this:
- Fully read-only agents with no external side effects
- Real-time systems where human latency is unacceptable (use shadow mode instead)
Tested with Node.js 22.x, TypeScript 5.7, OpenAI SDK 4.x, Anthropic SDK 0.39.x