Build an AI Quant Trading Bot: Backtest, Risk Rules, Live Execution 2026

Build a Python AI trading bot with vectorbt backtesting, LLM signal generation, position sizing rules, and live execution via CCXT. Production-ready guide.

Problem: LLM Signal Generation Without Risk Controls Blows Up Accounts

You've seen demos of GPT-4 or Claude analyzing charts and spitting out trade signals. What they skip: without a backtesting framework to validate signals historically, hard position-sizing rules, and a battle-tested execution layer, you're not trading — you're gambling with an API key.

This guide builds the complete stack: vectorbt for backtesting, an LLM agent for signal scoring, Kelly Criterion for position sizing, and CCXT for live execution with circuit breakers.

You'll learn:

  • How to structure a backtestable signal pipeline that also runs live
  • Where to inject LLM analysis without making it a single point of failure
  • Risk rules that survive real market conditions: max drawdown kill switch, per-trade stop loss, daily loss limit
  • How to wire CCXT to Binance (or any CEX) with paper trading mode first

Time: 45 min | Difficulty: Advanced


Architecture Overview

The system has four layers. Each is independently testable.

Market Data (OHLCV)
        │
        ▼
Signal Engine ──► LLM Scorer (optional alpha layer)
        │
        ▼
Risk Manager (position sizing, kill switches)
        │
        ▼
Execution Layer (CCXT → paper or live)
        │
        ▼
Trade Logger (SQLite / CSV for audit trail)

The LLM sits in the signal scoring layer, not the execution layer. It adds conviction weight to quantitative signals — it never fires orders directly.


Prerequisites

  • Python 3.11+
  • A Binance account (testnet works; free to create)
  • An Anthropic or OpenAI API key for signal scoring
  • ~4GB RAM for vectorbt on a year of minute data
# Install all dependencies
pip install vectorbt ccxt anthropic pandas numpy python-dotenv rich

# Verify vectorbt (it's the heaviest dep)
python -c "import vectorbt as vbt; print(vbt.__version__)"

Expected: 0.26.x or higher


Solution

Step 1: Build the Data Layer

Everything starts with clean OHLCV data. We fetch from Binance and cache locally so backtests don't hammer the API.

# data/fetcher.py
import ccxt
import pandas as pd
from pathlib import Path
import time

def fetch_ohlcv(
    symbol: str = "BTC/USDT",
    timeframe: str = "1h",
    limit: int = 1000,
    exchange_id: str = "binance",
) -> pd.DataFrame:
    """
    Fetch OHLCV and cache to parquet.
    Re-fetches only if cache is older than 1 hour.
    """
    cache_path = Path(f"data/cache/{symbol.replace('/', '_')}_{timeframe}.parquet")
    cache_path.parent.mkdir(parents=True, exist_ok=True)

    # Use cache if fresh (avoids rate limit during dev)
    if cache_path.exists():
        age_seconds = time.time() - cache_path.stat().st_mtime
        if age_seconds < 3600:
            return pd.read_parquet(cache_path)

    exchange = ccxt.binance({"enableRateLimit": True})
    raw = exchange.fetch_ohlcv(symbol, timeframe, limit=limit)

    df = pd.DataFrame(raw, columns=["timestamp", "open", "high", "low", "close", "volume"])
    df["timestamp"] = pd.to_datetime(df["timestamp"], unit="ms", utc=True)
    df.set_index("timestamp", inplace=True)

    df.to_parquet(cache_path)
    return df

Step 2: Build the Signal Engine

We use a classic RSI + EMA crossover as the quantitative backbone. This is the signal the LLM will score, not replace.

# signals/quant_signals.py
import pandas as pd
import numpy as np

def compute_signals(df: pd.DataFrame) -> pd.DataFrame:
    """
    Returns DataFrame with columns: rsi, ema_fast, ema_slow, signal
    signal = 1 (long), -1 (short), 0 (flat)
    """
    close = df["close"]

    # RSI-14
    delta = close.diff()
    gain = delta.clip(lower=0).rolling(14).mean()
    loss = (-delta.clip(upper=0)).rolling(14).mean()
    rs = gain / loss.replace(0, np.nan)
    df["rsi"] = 100 - (100 / (1 + rs))

    # EMA crossover: fast=12, slow=26 (same as MACD baseline)
    df["ema_fast"] = close.ewm(span=12, adjust=False).mean()
    df["ema_slow"] = close.ewm(span=26, adjust=False).mean()

    # Raw signal: both conditions must align
    ema_cross_long = df["ema_fast"] > df["ema_slow"]
    rsi_not_overbought = df["rsi"] < 70  # Don't buy into overbought

    df["signal"] = 0
    df.loc[ema_cross_long & rsi_not_overbought, "signal"] = 1
    df.loc[~ema_cross_long & (df["rsi"] > 30), "signal"] = -1

    return df.dropna()

Step 3: Add the LLM Scoring Layer

The LLM reads recent OHLCV context plus current signal and returns a conviction score from -1.0 to 1.0. A score below 0.3 suppresses the trade — this is your AI-powered filter, not your trigger.

# signals/llm_scorer.py
import anthropic
import json

client = anthropic.Anthropic()

SYSTEM_PROMPT = """You are a quantitative analyst assistant. You receive OHLCV summary data 
and a proposed trade signal. Return ONLY valid JSON with this schema:
{"conviction": float, "reasoning": str, "suppress": bool}

conviction: -1.0 (strong counter) to 1.0 (strong confirm)
suppress: true if conviction < 0.3 or you see clear invalidating pattern
reasoning: max 20 words"""

def score_signal(
    symbol: str,
    signal_direction: int,  # 1 or -1
    recent_candles_summary: str,
    rsi: float,
    ema_spread_pct: float,
) -> dict:
    """
    Score a quant signal with LLM analysis.
    Falls back to {"conviction": 0.5, "suppress": False} on any API error
    so the quant signal still fires if LLM is unavailable.
    """
    prompt = f"""Symbol: {symbol}
Proposed signal: {"LONG" if signal_direction == 1 else "SHORT"}
RSI: {rsi:.1f}
EMA spread: {ema_spread_pct:.2f}%
Recent price action: {recent_candles_summary}

Score this signal."""

    try:
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=150,
            system=SYSTEM_PROMPT,
            messages=[{"role": "user", "content": prompt}],
        )
        return json.loads(response.content[0].text)
    except Exception as e:
        # LLM failure is non-fatal — quant signal still executes at 50% conviction
        print(f"[LLM scorer fallback] {e}")
        return {"conviction": 0.5, "suppress": False, "reasoning": "fallback"}

Why fallback matters: If Anthropic has a 30-second outage during a signal event, you still execute — at reduced conviction, not zero. Never let your LLM layer be a single point of failure.


Step 4: Build the Risk Manager

This is the most critical layer. Three rules are non-negotiable in production.

# risk/manager.py
from dataclasses import dataclass, field
from datetime import date

@dataclass
class RiskConfig:
    max_position_pct: float = 0.02      # Never risk more than 2% of portfolio per trade
    daily_loss_limit_pct: float = 0.05  # Kill switch: halt trading after 5% daily loss
    max_drawdown_pct: float = 0.15      # Kill switch: halt after 15% drawdown from peak
    stop_loss_pct: float = 0.015        # 1.5% stop loss per trade
    min_conviction: float = 0.3         # LLM conviction threshold to allow trade

@dataclass
class RiskState:
    peak_equity: float
    current_equity: float
    daily_start_equity: float
    daily_losses: float = 0.0
    halted: bool = False
    halt_reason: str = ""
    trade_date: date = field(default_factory=date.today)

class RiskManager:
    def __init__(self, config: RiskConfig, initial_equity: float):
        self.config = config
        self.state = RiskState(
            peak_equity=initial_equity,
            current_equity=initial_equity,
            daily_start_equity=initial_equity,
        )

    def kelly_position_size(
        self,
        equity: float,
        win_rate: float,         # From backtest
        avg_win_pct: float,
        avg_loss_pct: float,
        conviction: float,
    ) -> float:
        """
        Half-Kelly position sizing, scaled by LLM conviction.
        Kelly fraction = (win_rate / avg_loss_pct) - ((1 - win_rate) / avg_win_pct)
        We use half-Kelly to reduce variance.
        """
        if avg_loss_pct == 0 or avg_win_pct == 0:
            return 0.0

        kelly = (win_rate / avg_loss_pct) - ((1 - win_rate) / avg_win_pct)
        half_kelly = kelly / 2

        # Scale by conviction (0.3 minimum conviction lets through 30% size)
        scaled = half_kelly * conviction

        # Hard cap: never exceed max_position_pct regardless of Kelly output
        return min(max(scaled, 0.0), self.config.max_position_pct)

    def check_and_update(self, current_equity: float) -> bool:
        """
        Returns True if trading is allowed.
        Updates kill switches based on current equity.
        """
        # Reset daily tracking on new day
        if date.today() != self.state.trade_date:
            self.state.daily_start_equity = current_equity
            self.state.daily_losses = 0.0
            self.state.trade_date = date.today()

        self.state.current_equity = current_equity
        self.state.peak_equity = max(self.state.peak_equity, current_equity)

        # Kill switch 1: max drawdown from peak
        drawdown = (self.state.peak_equity - current_equity) / self.state.peak_equity
        if drawdown >= self.config.max_drawdown_pct:
            self.state.halted = True
            self.state.halt_reason = f"Max drawdown {drawdown:.1%} exceeded {self.config.max_drawdown_pct:.1%}"
            return False

        # Kill switch 2: daily loss limit
        daily_loss = (self.state.daily_start_equity - current_equity) / self.state.daily_start_equity
        if daily_loss >= self.config.daily_loss_limit_pct:
            self.state.halted = True
            self.state.halt_reason = f"Daily loss {daily_loss:.1%} exceeded {self.config.daily_loss_limit_pct:.1%}"
            return False

        return True

    def approve_trade(self, conviction: float) -> bool:
        """Check conviction threshold before allowing execution."""
        if self.state.halted:
            return False
        if conviction < self.config.min_conviction:
            return False
        return True

Step 5: Backtest the Strategy with vectorbt

Before touching live markets, validate your combined signal → risk pipeline on historical data. vectorbt runs vectorized backtests in seconds, not minutes.

# backtest/run_backtest.py
import vectorbt as vbt
import pandas as pd
from data.fetcher import fetch_ohlcv
from signals.quant_signals import compute_signals

def run_backtest(symbol: str = "BTC/USDT", timeframe: str = "1h"):
    df = fetch_ohlcv(symbol, timeframe, limit=5000)  # ~7 months of 1h data
    df = compute_signals(df)

    close = df["close"]
    entries = df["signal"] == 1
    exits = df["signal"] == -1

    # 1.5% stop loss (matches RiskConfig.stop_loss_pct)
    pf = vbt.Portfolio.from_signals(
        close,
        entries,
        exits,
        sl_stop=0.015,          # Stop loss at 1.5%
        init_cash=10_000,
        fees=0.001,              # 0.1% maker/taker (Binance standard)
        slippage=0.0005,         # 0.05% slippage estimate
    )

    stats = pf.stats()
    print(stats)

    # Extract win rate and avg win/loss for Kelly sizing
    trades = pf.trades.records_readable
    win_rate = (trades["PnL"] > 0).mean()
    avg_win = trades.loc[trades["PnL"] > 0, "Return"].mean()
    avg_loss = abs(trades.loc[trades["PnL"] < 0, "Return"].mean())

    return {
        "total_return": stats["Total Return [%]"],
        "sharpe": stats["Sharpe Ratio"],
        "max_drawdown": stats["Max Drawdown [%]"],
        "win_rate": win_rate,
        "avg_win_pct": avg_win,
        "avg_loss_pct": avg_loss,
        "n_trades": len(trades),
    }

if __name__ == "__main__":
    results = run_backtest()
    print(f"\nWin rate: {results['win_rate']:.1%}")
    print(f"Sharpe: {results['sharpe']:.2f}")
    print(f"Max drawdown: {results['max_drawdown']:.1f}%")
    print(f"Kelly inputs → win_rate={results['win_rate']:.2f}, avg_win={results['avg_win_pct']:.3f}, avg_loss={results['avg_loss_pct']:.3f}")

Before going live, require:

  • Sharpe Ratio > 1.0
  • Max Drawdown < 20%
  • N trades > 50 (statistical significance)

If your backtest doesn't pass these gates, fix the signal — not the risk rules.


Step 6: Wire the Execution Layer

CCXT abstracts 100+ exchanges behind one interface. We use Binance Testnet for paper trading before flipping to live.

# execution/executor.py
import ccxt
import os
from dataclasses import dataclass

@dataclass
class Order:
    symbol: str
    side: str          # "buy" or "sell"
    amount_usdt: float
    stop_loss_pct: float = 0.015

class Executor:
    def __init__(self, paper: bool = True):
        api_key = os.getenv("BINANCE_API_KEY")
        api_secret = os.getenv("BINANCE_API_SECRET")

        self.exchange = ccxt.binance({
            "apiKey": api_key,
            "secret": api_secret,
            "enableRateLimit": True,
            # Testnet URL — remove these two lines for live trading
            "urls": {"api": {"rest": "https://testnet.binance.vision"}},
            "options": {"defaultType": "spot"},
        })
        self.paper = paper

    def get_equity(self) -> float:
        """Return total USDT value of portfolio."""
        balance = self.exchange.fetch_balance()
        return balance["USDT"]["total"]

    def place_order(self, order: Order) -> dict:
        """
        Place market buy/sell with immediate stop-loss OCO.
        Returns order dict or raises on failure.
        """
        ticker = self.exchange.fetch_ticker(order.symbol)
        current_price = ticker["last"]

        # Convert USDT amount to base asset quantity
        quantity = order.amount_usdt / current_price

        # Round to exchange's lot size requirements
        markets = self.exchange.load_markets()
        precision = markets[order.symbol]["precision"]["amount"]
        quantity = self.exchange.amount_to_precision(order.symbol, quantity)

        if self.paper:
            print(f"[PAPER] {order.side.upper()} {quantity} {order.symbol} @ ~{current_price:.2f}")
            return {"status": "paper", "price": current_price, "amount": quantity}

        # Live order
        market_order = self.exchange.create_order(
            order.symbol,
            "market",
            order.side,
            quantity,
        )

        # Immediately place stop loss
        stop_price = current_price * (1 - order.stop_loss_pct) if order.side == "buy" \
                     else current_price * (1 + order.stop_loss_pct)

        self.exchange.create_order(
            order.symbol,
            "stop_market",
            "sell" if order.side == "buy" else "buy",
            quantity,
            params={"stopPrice": round(stop_price, 2)},
        )

        return market_order

Step 7: Wire Everything Into the Trading Loop

# main.py
import time
import os
from dotenv import load_dotenv
from data.fetcher import fetch_ohlcv
from signals.quant_signals import compute_signals
from signals.llm_scorer import score_signal
from risk.manager import RiskManager, RiskConfig
from execution.executor import Executor, Order

load_dotenv()

# --- Config ---
SYMBOL = "BTC/USDT"
TIMEFRAME = "1h"
PAPER_TRADING = True  # Set False only after paper trading passes 2 weeks

# From your backtest output
BACKTEST_WIN_RATE = 0.54
BACKTEST_AVG_WIN = 0.022
BACKTEST_AVG_LOSS = 0.014

def run_loop():
    executor = Executor(paper=PAPER_TRADING)
    initial_equity = executor.get_equity()
    risk = RiskManager(RiskConfig(), initial_equity)

    print(f"Starting with equity: ${initial_equity:.2f} | Paper: {PAPER_TRADING}")

    while True:
        try:
            # 1. Fetch fresh data
            df = fetch_ohlcv(SYMBOL, TIMEFRAME, limit=100)
            df = compute_signals(df)
            latest = df.iloc[-1]

            if latest["signal"] == 0:
                time.sleep(60)
                continue

            # 2. Check risk kill switches
            current_equity = executor.get_equity()
            if not risk.check_and_update(current_equity):
                print(f"[HALT] {risk.state.halt_reason}")
                break

            # 3. Score signal with LLM
            recent_summary = (
                f"Last 5 candles: {df['close'].tail(5).round(0).tolist()}. "
                f"Volume trend: {'rising' if df['volume'].tail(3).is_monotonic_increasing else 'falling'}."
            )
            ema_spread = (latest["ema_fast"] - latest["ema_slow"]) / latest["ema_slow"] * 100
            llm_result = score_signal(
                SYMBOL, int(latest["signal"]), recent_summary,
                float(latest["rsi"]), float(ema_spread)
            )

            conviction = llm_result["conviction"]
            print(f"Signal: {'LONG' if latest['signal'] == 1 else 'SHORT'} | "
                  f"Conviction: {conviction:.2f} | Suppress: {llm_result['suppress']} | "
                  f"{llm_result['reasoning']}")

            if llm_result["suppress"] or not risk.approve_trade(conviction):
                print("Trade suppressed by risk/LLM filter.")
                time.sleep(60)
                continue

            # 4. Kelly position sizing
            position_pct = risk.kelly_position_size(
                current_equity,
                BACKTEST_WIN_RATE,
                BACKTEST_AVG_WIN,
                BACKTEST_AVG_LOSS,
                conviction,
            )
            trade_usdt = current_equity * position_pct
            print(f"Position size: ${trade_usdt:.2f} ({position_pct:.2%} of equity)")

            # 5. Execute
            side = "buy" if latest["signal"] == 1 else "sell"
            order = Order(SYMBOL, side, trade_usdt)
            result = executor.place_order(order)
            print(f"Order result: {result}")

        except KeyboardInterrupt:
            print("Shutting down trading loop.")
            break
        except Exception as e:
            print(f"[ERROR] {e}")
            time.sleep(30)  # Back off 30s on unexpected error, don't crash

        time.sleep(60)  # 1-minute polling interval for 1h bars is sufficient

if __name__ == "__main__":
    run_loop()

Verification

Run the backtest first and confirm it passes the gates:

python backtest/run_backtest.py

You should see output like:

Start Value                   10000.00
End Value                     14230.00
Total Return [%]                 42.30
Sharpe Ratio                      1.24
Max Drawdown [%]                 12.40
Total Trades                        87

Win rate: 54.0%
Sharpe: 1.24
Max drawdown: 12.4%

Then run in paper mode for at least two weeks:

# Copy .env.example to .env and fill in testnet keys
cp .env.example .env

# Paper trading — safe to run 24/7
python main.py

Check the kill switches work:

python -c "
from risk.manager import RiskManager, RiskConfig
rm = RiskManager(RiskConfig(), 10000)
# Simulate 16% drawdown — should trigger halt
allowed = rm.check_and_update(8400)
print('Trading allowed:', allowed)
print('Halt reason:', rm.state.halt_reason)
"

Expected:

Trading allowed: False
Halt reason: Max drawdown 16.0% exceeded 15.0%

Production Deployment Notes

Running this locally means your bot dies when your laptop sleeps. For always-on execution:

Minimum viable production setup:

  • A $6/mo VPS (DigitalOcean, Vultr) running Ubuntu 24.04
  • systemd service or Docker container for auto-restart
  • Telegram bot integration for halt alerts (add python-telegram-bot and call on halt)
  • Log all trades to SQLite with sqlite3 — you need an audit trail for tax purposes
# Quick systemd service
sudo tee /etc/systemd/system/tradingbot.service << EOF
[Unit]
Description=AI Trading Bot
After=network.target

[Service]
User=ubuntu
WorkingDirectory=/home/ubuntu/tradingbot
ExecStart=/usr/bin/python3 main.py
Restart=on-failure
RestartSec=30

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl enable tradingbot
sudo systemctl start tradingbot

What You Learned

  • The LLM belongs in the signal scoring layer, not the execution layer — it filters trades, never fires them
  • Kelly Criterion with a conviction multiplier makes position sizing dynamic and mathematically grounded
  • Three kill switches (per-trade stop loss, daily loss limit, max drawdown) need to be hardcoded, not configurable at runtime
  • Paper trade for two weeks minimum before live — the edge case that blows accounts usually shows up in week two
  • LLM scorer must have a fallback; API downtime during a signal event is not hypothetical

Hard limitations of this setup:

  • Signal is based on 1h bars — it misses intraday moves and is not suitable for scalping
  • Backtests don't capture slippage accurately on large size; scale up position sizes cautiously
  • The LLM scorer adds ~1–2 seconds of latency per signal event — acceptable for 1h bars, not for HFT

Tested on Python 3.12, vectorbt 0.26.2, ccxt 4.3.x, Binance Testnet, Ubuntu 24.04