Problem: You Need a Capable LLM Without US Cloud Lock-In
Your team needs a frontier-class language model, but OpenAI and Anthropic's APIs mean storing data on US infrastructure — a problem for GDPR-sensitive workloads. Mistral Large 3 closes the gap: it benchmarks near GPT-4o on reasoning and code, runs under an Apache 2.0 license, and you can self-host it entirely in the EU.
You'll learn:
- How to call Mistral Large 3 via the Mistral API in under 10 lines of Python
- When to use the managed API vs. self-hosted vLLM
- How to swap Mistral into an existing OpenAI-compatible codebase with minimal changes
Time: 20 min | Level: Intermediate
Why This Model Matters
Mistral Large 3 (123B parameters) is Mistral AI's answer to GPT-4-class models. It ships with a 128K context window, native function calling, and strong multilingual support across 12+ languages — including French, German, Spanish, and Arabic.
What makes it different:
- Apache 2.0 license — you can fine-tune and redistribute commercially
- OpenAI-compatible API surface — minimal migration effort
- Full self-hosting via llama.cpp or vLLM on a single 8×H100 node
When NOT to use it:
- You need sub-5B inference on CPU (use Mistral 7B instead)
- Your team has no MLOps capacity for self-hosting (use the managed API)
- You need real-time image understanding (Mistral Large 3 is text-only)
Solution
Step 1: Set Up the Mistral Client
Install the SDK:
pip install mistralai
import os
from mistralai import Mistral
client = Mistral(api_key=os.environ["MISTRAL_API_KEY"])
response = client.chat.complete(
model="mistral-large-latest",
messages=[
{"role": "user", "content": "Summarize the EU AI Act in 3 bullet points."}
]
)
print(response.choices[0].message.content)
Expected: A clean 3-point summary, typically in under 2 seconds.
If it fails:
- AuthenticationError: Check your key at
console.mistral.ai— keys are workspace-scoped - Model not found: Use
"mistral-large-latest"not"mistral-large-3"— Mistral aliases to latest stable
Step 2: Enable Function Calling
Mistral Large 3 supports the same tool-calling schema as OpenAI. If you already have OpenAI tool definitions, they drop in unchanged.
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
"units": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["city"]
}
}
}
]
response = client.chat.complete(
model="mistral-large-latest",
messages=[{"role": "user", "content": "What's the weather in Paris?"}],
tools=tools,
tool_choice="auto" # Let the model decide when to call tools
)
# Check if tool was called
if response.choices[0].finish_reason == "tool_calls":
tool_call = response.choices[0].message.tool_calls[0]
print(f"Calling: {tool_call.function.name}")
print(f"Args: {tool_call.function.arguments}")
Why tool_choice="auto": Mistral handles ambiguous queries better when it can decide whether a tool is needed. Force it with "any" only when the tool must always run.
Step 3: Migrate from OpenAI (If Applicable)
Mistral's API is OpenAI-compatible. If you're using the openai Python SDK, swap two lines:
# Before (OpenAI)
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# After (Mistral — same interface)
from openai import OpenAI
client = OpenAI(
api_key=os.environ["MISTRAL_API_KEY"],
base_url="https://api.mistral.ai/v1" # Point to Mistral's endpoint
)
# Everything else stays the same
response = client.chat.completions.create(
model="mistral-large-latest",
messages=[{"role": "user", "content": "Hello"}]
)
If it fails:
- Model routing errors: Some OpenAI-specific models (like
gpt-4o) won't exist on Mistral's endpoint — update model names - Streaming differences: Mistral uses the same SSE format, but check your chunk parsing if you see empty deltas
Step 4: Self-Host with vLLM (Optional)
For full data sovereignty, run Mistral Large 3 on your own infra. You'll need 8×H100 80GB GPUs for FP16, or 4×H100 with AWQ quantization.
# Install vLLM
pip install vllm
# Start the server (tensor-parallel across 8 GPUs)
python -m vllm.entrypoints.openai.api_server \
--model mistralai/Mistral-Large-Instruct-2407 \
--tensor-parallel-size 8 \
--max-model-len 32768 \
--port 8000
Your app code stays identical — just update base_url to your server:
client = OpenAI(
api_key="not-needed", # vLLM doesn't require auth by default
base_url="http://your-server:8000/v1"
)
Quantized option (4 GPUs, ~10% quality loss):
python -m vllm.entrypoints.openai.api_server \
--model mistralai/Mistral-Large-Instruct-2407 \
--quantization awq \
--tensor-parallel-size 4
Verification
curl https://api.mistral.ai/v1/chat/completions \
-H "Authorization: Bearer $MISTRAL_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "mistral-large-latest", "messages": [{"role": "user", "content": "Reply with OK"}]}'
You should see: JSON with "content": "OK" in choices[0].message. Latency should be under 3 seconds for this prompt.
Managed API vs. Self-Hosted: Quick Reference
| Managed API | Self-Hosted (vLLM) | |
|---|---|---|
| Setup time | 5 minutes | 2–4 hours |
| Data residency | Mistral's EU servers | Your infra |
| Cost at scale | Per-token billing | GPU amortization |
| Context window | 128K | 32K–128K (configurable) |
| Compliance | SOC 2, GDPR | You control everything |
Use the managed API for prototypes and low-to-medium volume. Self-host when you're processing >10M tokens/day or have strict data residency requirements.
What You Learned
- Mistral Large 3's API is a near-drop-in replacement for OpenAI's
gpt-4endpoints - Tool calling uses the same schema — existing OpenAI tool definitions work without modification
- Self-hosting requires serious GPU hardware but gives you complete control over data and costs
- The Apache 2.0 license means you can fine-tune and deploy commercially without Mistral's involvement
Limitation: Mistral Large 3 is text-only. For multimodal workloads, look at Pixtral Large (Mistral's vision model) or keep a separate pipeline for image tasks.
Tested on mistralai SDK 1.x, Python 3.12, vLLM 0.4.x, Ubuntu 22.04