Problem: LLM Signal Generation Without Risk Controls Blows Up Accounts
You've seen demos of GPT-4 or Claude analyzing charts and spitting out trade signals. What they skip: without a backtesting framework to validate signals historically, hard position-sizing rules, and a battle-tested execution layer, you're not trading — you're gambling with an API key.
This guide builds the complete stack: vectorbt for backtesting, an LLM agent for signal scoring, Kelly Criterion for position sizing, and CCXT for live execution with circuit breakers.
You'll learn:
- How to structure a backtestable signal pipeline that also runs live
- Where to inject LLM analysis without making it a single point of failure
- Risk rules that survive real market conditions: max drawdown kill switch, per-trade stop loss, daily loss limit
- How to wire CCXT to Binance (or any CEX) with paper trading mode first
Time: 45 min | Difficulty: Advanced
Architecture Overview
The system has four layers. Each is independently testable.
Market Data (OHLCV)
│
▼
Signal Engine ──► LLM Scorer (optional alpha layer)
│
▼
Risk Manager (position sizing, kill switches)
│
▼
Execution Layer (CCXT → paper or live)
│
▼
Trade Logger (SQLite / CSV for audit trail)
The LLM sits in the signal scoring layer, not the execution layer. It adds conviction weight to quantitative signals — it never fires orders directly.
Prerequisites
- Python 3.11+
- A Binance account (testnet works; free to create)
- An Anthropic or OpenAI API key for signal scoring
- ~4GB RAM for vectorbt on a year of minute data
# Install all dependencies
pip install vectorbt ccxt anthropic pandas numpy python-dotenv rich
# Verify vectorbt (it's the heaviest dep)
python -c "import vectorbt as vbt; print(vbt.__version__)"
Expected: 0.26.x or higher
Solution
Step 1: Build the Data Layer
Everything starts with clean OHLCV data. We fetch from Binance and cache locally so backtests don't hammer the API.
# data/fetcher.py
import ccxt
import pandas as pd
from pathlib import Path
import time
def fetch_ohlcv(
symbol: str = "BTC/USDT",
timeframe: str = "1h",
limit: int = 1000,
exchange_id: str = "binance",
) -> pd.DataFrame:
"""
Fetch OHLCV and cache to parquet.
Re-fetches only if cache is older than 1 hour.
"""
cache_path = Path(f"data/cache/{symbol.replace('/', '_')}_{timeframe}.parquet")
cache_path.parent.mkdir(parents=True, exist_ok=True)
# Use cache if fresh (avoids rate limit during dev)
if cache_path.exists():
age_seconds = time.time() - cache_path.stat().st_mtime
if age_seconds < 3600:
return pd.read_parquet(cache_path)
exchange = ccxt.binance({"enableRateLimit": True})
raw = exchange.fetch_ohlcv(symbol, timeframe, limit=limit)
df = pd.DataFrame(raw, columns=["timestamp", "open", "high", "low", "close", "volume"])
df["timestamp"] = pd.to_datetime(df["timestamp"], unit="ms", utc=True)
df.set_index("timestamp", inplace=True)
df.to_parquet(cache_path)
return df
Step 2: Build the Signal Engine
We use a classic RSI + EMA crossover as the quantitative backbone. This is the signal the LLM will score, not replace.
# signals/quant_signals.py
import pandas as pd
import numpy as np
def compute_signals(df: pd.DataFrame) -> pd.DataFrame:
"""
Returns DataFrame with columns: rsi, ema_fast, ema_slow, signal
signal = 1 (long), -1 (short), 0 (flat)
"""
close = df["close"]
# RSI-14
delta = close.diff()
gain = delta.clip(lower=0).rolling(14).mean()
loss = (-delta.clip(upper=0)).rolling(14).mean()
rs = gain / loss.replace(0, np.nan)
df["rsi"] = 100 - (100 / (1 + rs))
# EMA crossover: fast=12, slow=26 (same as MACD baseline)
df["ema_fast"] = close.ewm(span=12, adjust=False).mean()
df["ema_slow"] = close.ewm(span=26, adjust=False).mean()
# Raw signal: both conditions must align
ema_cross_long = df["ema_fast"] > df["ema_slow"]
rsi_not_overbought = df["rsi"] < 70 # Don't buy into overbought
df["signal"] = 0
df.loc[ema_cross_long & rsi_not_overbought, "signal"] = 1
df.loc[~ema_cross_long & (df["rsi"] > 30), "signal"] = -1
return df.dropna()
Step 3: Add the LLM Scoring Layer
The LLM reads recent OHLCV context plus current signal and returns a conviction score from -1.0 to 1.0. A score below 0.3 suppresses the trade — this is your AI-powered filter, not your trigger.
# signals/llm_scorer.py
import anthropic
import json
client = anthropic.Anthropic()
SYSTEM_PROMPT = """You are a quantitative analyst assistant. You receive OHLCV summary data
and a proposed trade signal. Return ONLY valid JSON with this schema:
{"conviction": float, "reasoning": str, "suppress": bool}
conviction: -1.0 (strong counter) to 1.0 (strong confirm)
suppress: true if conviction < 0.3 or you see clear invalidating pattern
reasoning: max 20 words"""
def score_signal(
symbol: str,
signal_direction: int, # 1 or -1
recent_candles_summary: str,
rsi: float,
ema_spread_pct: float,
) -> dict:
"""
Score a quant signal with LLM analysis.
Falls back to {"conviction": 0.5, "suppress": False} on any API error
so the quant signal still fires if LLM is unavailable.
"""
prompt = f"""Symbol: {symbol}
Proposed signal: {"LONG" if signal_direction == 1 else "SHORT"}
RSI: {rsi:.1f}
EMA spread: {ema_spread_pct:.2f}%
Recent price action: {recent_candles_summary}
Score this signal."""
try:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=150,
system=SYSTEM_PROMPT,
messages=[{"role": "user", "content": prompt}],
)
return json.loads(response.content[0].text)
except Exception as e:
# LLM failure is non-fatal — quant signal still executes at 50% conviction
print(f"[LLM scorer fallback] {e}")
return {"conviction": 0.5, "suppress": False, "reasoning": "fallback"}
Why fallback matters: If Anthropic has a 30-second outage during a signal event, you still execute — at reduced conviction, not zero. Never let your LLM layer be a single point of failure.
Step 4: Build the Risk Manager
This is the most critical layer. Three rules are non-negotiable in production.
# risk/manager.py
from dataclasses import dataclass, field
from datetime import date
@dataclass
class RiskConfig:
max_position_pct: float = 0.02 # Never risk more than 2% of portfolio per trade
daily_loss_limit_pct: float = 0.05 # Kill switch: halt trading after 5% daily loss
max_drawdown_pct: float = 0.15 # Kill switch: halt after 15% drawdown from peak
stop_loss_pct: float = 0.015 # 1.5% stop loss per trade
min_conviction: float = 0.3 # LLM conviction threshold to allow trade
@dataclass
class RiskState:
peak_equity: float
current_equity: float
daily_start_equity: float
daily_losses: float = 0.0
halted: bool = False
halt_reason: str = ""
trade_date: date = field(default_factory=date.today)
class RiskManager:
def __init__(self, config: RiskConfig, initial_equity: float):
self.config = config
self.state = RiskState(
peak_equity=initial_equity,
current_equity=initial_equity,
daily_start_equity=initial_equity,
)
def kelly_position_size(
self,
equity: float,
win_rate: float, # From backtest
avg_win_pct: float,
avg_loss_pct: float,
conviction: float,
) -> float:
"""
Half-Kelly position sizing, scaled by LLM conviction.
Kelly fraction = (win_rate / avg_loss_pct) - ((1 - win_rate) / avg_win_pct)
We use half-Kelly to reduce variance.
"""
if avg_loss_pct == 0 or avg_win_pct == 0:
return 0.0
kelly = (win_rate / avg_loss_pct) - ((1 - win_rate) / avg_win_pct)
half_kelly = kelly / 2
# Scale by conviction (0.3 minimum conviction lets through 30% size)
scaled = half_kelly * conviction
# Hard cap: never exceed max_position_pct regardless of Kelly output
return min(max(scaled, 0.0), self.config.max_position_pct)
def check_and_update(self, current_equity: float) -> bool:
"""
Returns True if trading is allowed.
Updates kill switches based on current equity.
"""
# Reset daily tracking on new day
if date.today() != self.state.trade_date:
self.state.daily_start_equity = current_equity
self.state.daily_losses = 0.0
self.state.trade_date = date.today()
self.state.current_equity = current_equity
self.state.peak_equity = max(self.state.peak_equity, current_equity)
# Kill switch 1: max drawdown from peak
drawdown = (self.state.peak_equity - current_equity) / self.state.peak_equity
if drawdown >= self.config.max_drawdown_pct:
self.state.halted = True
self.state.halt_reason = f"Max drawdown {drawdown:.1%} exceeded {self.config.max_drawdown_pct:.1%}"
return False
# Kill switch 2: daily loss limit
daily_loss = (self.state.daily_start_equity - current_equity) / self.state.daily_start_equity
if daily_loss >= self.config.daily_loss_limit_pct:
self.state.halted = True
self.state.halt_reason = f"Daily loss {daily_loss:.1%} exceeded {self.config.daily_loss_limit_pct:.1%}"
return False
return True
def approve_trade(self, conviction: float) -> bool:
"""Check conviction threshold before allowing execution."""
if self.state.halted:
return False
if conviction < self.config.min_conviction:
return False
return True
Step 5: Backtest the Strategy with vectorbt
Before touching live markets, validate your combined signal → risk pipeline on historical data. vectorbt runs vectorized backtests in seconds, not minutes.
# backtest/run_backtest.py
import vectorbt as vbt
import pandas as pd
from data.fetcher import fetch_ohlcv
from signals.quant_signals import compute_signals
def run_backtest(symbol: str = "BTC/USDT", timeframe: str = "1h"):
df = fetch_ohlcv(symbol, timeframe, limit=5000) # ~7 months of 1h data
df = compute_signals(df)
close = df["close"]
entries = df["signal"] == 1
exits = df["signal"] == -1
# 1.5% stop loss (matches RiskConfig.stop_loss_pct)
pf = vbt.Portfolio.from_signals(
close,
entries,
exits,
sl_stop=0.015, # Stop loss at 1.5%
init_cash=10_000,
fees=0.001, # 0.1% maker/taker (Binance standard)
slippage=0.0005, # 0.05% slippage estimate
)
stats = pf.stats()
print(stats)
# Extract win rate and avg win/loss for Kelly sizing
trades = pf.trades.records_readable
win_rate = (trades["PnL"] > 0).mean()
avg_win = trades.loc[trades["PnL"] > 0, "Return"].mean()
avg_loss = abs(trades.loc[trades["PnL"] < 0, "Return"].mean())
return {
"total_return": stats["Total Return [%]"],
"sharpe": stats["Sharpe Ratio"],
"max_drawdown": stats["Max Drawdown [%]"],
"win_rate": win_rate,
"avg_win_pct": avg_win,
"avg_loss_pct": avg_loss,
"n_trades": len(trades),
}
if __name__ == "__main__":
results = run_backtest()
print(f"\nWin rate: {results['win_rate']:.1%}")
print(f"Sharpe: {results['sharpe']:.2f}")
print(f"Max drawdown: {results['max_drawdown']:.1f}%")
print(f"Kelly inputs → win_rate={results['win_rate']:.2f}, avg_win={results['avg_win_pct']:.3f}, avg_loss={results['avg_loss_pct']:.3f}")
Before going live, require:
- Sharpe Ratio > 1.0
- Max Drawdown < 20%
- N trades > 50 (statistical significance)
If your backtest doesn't pass these gates, fix the signal — not the risk rules.
Step 6: Wire the Execution Layer
CCXT abstracts 100+ exchanges behind one interface. We use Binance Testnet for paper trading before flipping to live.
# execution/executor.py
import ccxt
import os
from dataclasses import dataclass
@dataclass
class Order:
symbol: str
side: str # "buy" or "sell"
amount_usdt: float
stop_loss_pct: float = 0.015
class Executor:
def __init__(self, paper: bool = True):
api_key = os.getenv("BINANCE_API_KEY")
api_secret = os.getenv("BINANCE_API_SECRET")
self.exchange = ccxt.binance({
"apiKey": api_key,
"secret": api_secret,
"enableRateLimit": True,
# Testnet URL — remove these two lines for live trading
"urls": {"api": {"rest": "https://testnet.binance.vision"}},
"options": {"defaultType": "spot"},
})
self.paper = paper
def get_equity(self) -> float:
"""Return total USDT value of portfolio."""
balance = self.exchange.fetch_balance()
return balance["USDT"]["total"]
def place_order(self, order: Order) -> dict:
"""
Place market buy/sell with immediate stop-loss OCO.
Returns order dict or raises on failure.
"""
ticker = self.exchange.fetch_ticker(order.symbol)
current_price = ticker["last"]
# Convert USDT amount to base asset quantity
quantity = order.amount_usdt / current_price
# Round to exchange's lot size requirements
markets = self.exchange.load_markets()
precision = markets[order.symbol]["precision"]["amount"]
quantity = self.exchange.amount_to_precision(order.symbol, quantity)
if self.paper:
print(f"[PAPER] {order.side.upper()} {quantity} {order.symbol} @ ~{current_price:.2f}")
return {"status": "paper", "price": current_price, "amount": quantity}
# Live order
market_order = self.exchange.create_order(
order.symbol,
"market",
order.side,
quantity,
)
# Immediately place stop loss
stop_price = current_price * (1 - order.stop_loss_pct) if order.side == "buy" \
else current_price * (1 + order.stop_loss_pct)
self.exchange.create_order(
order.symbol,
"stop_market",
"sell" if order.side == "buy" else "buy",
quantity,
params={"stopPrice": round(stop_price, 2)},
)
return market_order
Step 7: Wire Everything Into the Trading Loop
# main.py
import time
import os
from dotenv import load_dotenv
from data.fetcher import fetch_ohlcv
from signals.quant_signals import compute_signals
from signals.llm_scorer import score_signal
from risk.manager import RiskManager, RiskConfig
from execution.executor import Executor, Order
load_dotenv()
# --- Config ---
SYMBOL = "BTC/USDT"
TIMEFRAME = "1h"
PAPER_TRADING = True # Set False only after paper trading passes 2 weeks
# From your backtest output
BACKTEST_WIN_RATE = 0.54
BACKTEST_AVG_WIN = 0.022
BACKTEST_AVG_LOSS = 0.014
def run_loop():
executor = Executor(paper=PAPER_TRADING)
initial_equity = executor.get_equity()
risk = RiskManager(RiskConfig(), initial_equity)
print(f"Starting with equity: ${initial_equity:.2f} | Paper: {PAPER_TRADING}")
while True:
try:
# 1. Fetch fresh data
df = fetch_ohlcv(SYMBOL, TIMEFRAME, limit=100)
df = compute_signals(df)
latest = df.iloc[-1]
if latest["signal"] == 0:
time.sleep(60)
continue
# 2. Check risk kill switches
current_equity = executor.get_equity()
if not risk.check_and_update(current_equity):
print(f"[HALT] {risk.state.halt_reason}")
break
# 3. Score signal with LLM
recent_summary = (
f"Last 5 candles: {df['close'].tail(5).round(0).tolist()}. "
f"Volume trend: {'rising' if df['volume'].tail(3).is_monotonic_increasing else 'falling'}."
)
ema_spread = (latest["ema_fast"] - latest["ema_slow"]) / latest["ema_slow"] * 100
llm_result = score_signal(
SYMBOL, int(latest["signal"]), recent_summary,
float(latest["rsi"]), float(ema_spread)
)
conviction = llm_result["conviction"]
print(f"Signal: {'LONG' if latest['signal'] == 1 else 'SHORT'} | "
f"Conviction: {conviction:.2f} | Suppress: {llm_result['suppress']} | "
f"{llm_result['reasoning']}")
if llm_result["suppress"] or not risk.approve_trade(conviction):
print("Trade suppressed by risk/LLM filter.")
time.sleep(60)
continue
# 4. Kelly position sizing
position_pct = risk.kelly_position_size(
current_equity,
BACKTEST_WIN_RATE,
BACKTEST_AVG_WIN,
BACKTEST_AVG_LOSS,
conviction,
)
trade_usdt = current_equity * position_pct
print(f"Position size: ${trade_usdt:.2f} ({position_pct:.2%} of equity)")
# 5. Execute
side = "buy" if latest["signal"] == 1 else "sell"
order = Order(SYMBOL, side, trade_usdt)
result = executor.place_order(order)
print(f"Order result: {result}")
except KeyboardInterrupt:
print("Shutting down trading loop.")
break
except Exception as e:
print(f"[ERROR] {e}")
time.sleep(30) # Back off 30s on unexpected error, don't crash
time.sleep(60) # 1-minute polling interval for 1h bars is sufficient
if __name__ == "__main__":
run_loop()
Verification
Run the backtest first and confirm it passes the gates:
python backtest/run_backtest.py
You should see output like:
Start Value 10000.00
End Value 14230.00
Total Return [%] 42.30
Sharpe Ratio 1.24
Max Drawdown [%] 12.40
Total Trades 87
Win rate: 54.0%
Sharpe: 1.24
Max drawdown: 12.4%
Then run in paper mode for at least two weeks:
# Copy .env.example to .env and fill in testnet keys
cp .env.example .env
# Paper trading — safe to run 24/7
python main.py
Check the kill switches work:
python -c "
from risk.manager import RiskManager, RiskConfig
rm = RiskManager(RiskConfig(), 10000)
# Simulate 16% drawdown — should trigger halt
allowed = rm.check_and_update(8400)
print('Trading allowed:', allowed)
print('Halt reason:', rm.state.halt_reason)
"
Expected:
Trading allowed: False
Halt reason: Max drawdown 16.0% exceeded 15.0%
Production Deployment Notes
Running this locally means your bot dies when your laptop sleeps. For always-on execution:
Minimum viable production setup:
- A $6/mo VPS (DigitalOcean, Vultr) running Ubuntu 24.04
systemdservice or Docker container for auto-restart- Telegram bot integration for halt alerts (add
python-telegram-botand call on halt) - Log all trades to SQLite with
sqlite3— you need an audit trail for tax purposes
# Quick systemd service
sudo tee /etc/systemd/system/tradingbot.service << EOF
[Unit]
Description=AI Trading Bot
After=network.target
[Service]
User=ubuntu
WorkingDirectory=/home/ubuntu/tradingbot
ExecStart=/usr/bin/python3 main.py
Restart=on-failure
RestartSec=30
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl enable tradingbot
sudo systemctl start tradingbot
What You Learned
- The LLM belongs in the signal scoring layer, not the execution layer — it filters trades, never fires them
- Kelly Criterion with a conviction multiplier makes position sizing dynamic and mathematically grounded
- Three kill switches (per-trade stop loss, daily loss limit, max drawdown) need to be hardcoded, not configurable at runtime
- Paper trade for two weeks minimum before live — the edge case that blows accounts usually shows up in week two
- LLM scorer must have a fallback; API downtime during a signal event is not hypothetical
Hard limitations of this setup:
- Signal is based on 1h bars — it misses intraday moves and is not suitable for scalping
- Backtests don't capture slippage accurately on large size; scale up position sizes cautiously
- The LLM scorer adds ~1–2 seconds of latency per signal event — acceptable for 1h bars, not for HFT
Tested on Python 3.12, vectorbt 0.26.2, ccxt 4.3.x, Binance Testnet, Ubuntu 24.04