Problem: DeepSeek R1 Returns Reasoning Tokens You Don't Know How to Handle
DeepSeek R1 isn't a standard chat API. It returns a reasoning_content field alongside the regular content field — and if you send requests without accounting for that, you'll get malformed responses, hit token limits unexpectedly, or silently discard the reasoning trace your prompt was designed to produce.
You'll learn:
- How to authenticate and send your first DeepSeek R1 API request in Python
- How to extract and use
reasoning_contentvscontentcorrectly - How to stream responses and handle rate limits without dropping requests
Time: 20 min | Difficulty: Intermediate
Why DeepSeek R1 Is Different from Standard LLM APIs
Most LLM APIs return one field: the model's reply. DeepSeek R1 is a reasoning model — it thinks before it answers. That thinking is exposed as reasoning_content in the response object, separate from the final content.
If you build a client expecting only content, you'll miss the reasoning entirely. Worse, if you pass reasoning_content back in the next turn's message history, the API will error — it's output-only.
Two fields to know:
message.reasoning_content— the chain-of-thought trace (can be thousands of tokens)message.content— the final answer to return to users
Solution
Step 1: Install the OpenAI SDK and Set Up Your Key
DeepSeek's API is OpenAI-compatible. Use the openai Python package with a custom base_url.
# uv is faster than pip — use it if you have it
uv add openai python-dotenv
# or pip
pip install openai python-dotenv
Create a .env file:
DEEPSEEK_API_KEY=sk-your-key-here
Get your key from platform.deepseek.com.
Step 2: Initialize the Client
import os
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv()
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com", # Not api.openai.com
)
The base_url override is all that separates this from a standard OpenAI client. Every other method call is identical.
Step 3: Send a Request and Extract Both Fields
def ask_r1(prompt: str) -> dict:
response = client.chat.completions.create(
model="deepseek-reasoner", # R1 model identifier
messages=[{"role": "user", "content": prompt}],
)
message = response.choices[0].message
return {
# Chain-of-thought — useful for debugging, logging, or showing "thinking"
"reasoning": message.reasoning_content,
# Final answer — what you show to users
"answer": message.content,
}
result = ask_r1("What is the time complexity of merge sort, and why?")
print("=== Reasoning ===")
print(result["reasoning"])
print("\n=== Answer ===")
print(result["answer"])
Expected output:
=== Reasoning ===
Let me think through merge sort step by step. Merge sort divides the array...
[several paragraphs of reasoning]
=== Answer ===
Merge sort has O(n log n) time complexity in all cases...
If it fails:
AuthenticationError→ Double-checkDEEPSEEK_API_KEYin.envand thatload_dotenv()runs before the client is createdmodel not found→ Use"deepseek-reasoner"exactly — not"deepseek-r1"or"r1"
Step 4: Stream Responses for Long Reasoning Traces
R1's reasoning can run 2,000–8,000 tokens before producing an answer. Without streaming, your user waits in silence for 30+ seconds. Stream it.
def ask_r1_stream(prompt: str) -> None:
stream = client.chat.completions.create(
model="deepseek-reasoner",
messages=[{"role": "user", "content": prompt}],
stream=True,
)
reasoning_buffer = []
answer_buffer = []
in_reasoning = True
for chunk in stream:
delta = chunk.choices[0].delta
if delta.reasoning_content:
# Still in the thinking phase
reasoning_buffer.append(delta.reasoning_content)
print(delta.reasoning_content, end="", flush=True)
elif delta.content:
# Reasoning finished; answer is streaming now
if in_reasoning:
in_reasoning = False
print("\n\n--- Answer ---\n")
answer_buffer.append(delta.content)
print(delta.content, end="", flush=True)
print() # Newline after stream ends
return {
"reasoning": "".join(reasoning_buffer),
"answer": "".join(answer_buffer),
}
The transition from reasoning_content chunks to content chunks marks when R1 stops thinking and starts answering. Track that boundary to split your UI into "thinking..." and final response states.
Step 5: Build Correct Multi-Turn Message History
This is where most R1 integrations break. You cannot include reasoning_content in the conversation history you send back to the API.
# ✅ Correct — only include role/content in history
def build_history(turns: list[dict]) -> list[dict]:
"""
turns = [{"user": "...", "answer": "..."}, ...]
reasoning_content is NOT sent back — API rejects it
"""
messages = []
for turn in turns:
messages.append({"role": "user", "content": turn["user"]})
messages.append({"role": "assistant", "content": turn["answer"]})
return messages
conversation = []
def chat(user_message: str) -> str:
history = build_history(conversation)
history.append({"role": "user", "content": user_message})
response = client.chat.completions.create(
model="deepseek-reasoner",
messages=history,
)
message = response.choices[0].message
result = {
"user": user_message,
"reasoning": message.reasoning_content,
"answer": message.content,
}
conversation.append(result)
return result["answer"]
Step 6: Handle Rate Limits with Exponential Backoff
DeepSeek's free tier allows 8 requests per minute. The paid tier is higher but still rate-limited during peak hours. Don't let a 429 crash your app.
import time
import random
from openai import RateLimitError
def ask_r1_with_retry(prompt: str, max_retries: int = 4) -> dict:
for attempt in range(max_retries):
try:
response = client.chat.completions.create(
model="deepseek-reasoner",
messages=[{"role": "user", "content": prompt}],
)
message = response.choices[0].message
return {"reasoning": message.reasoning_content, "answer": message.content}
except RateLimitError:
if attempt == max_retries - 1:
raise # Out of retries, let it bubble up
# Exponential backoff with jitter: 2s, 4s, 8s + random 0–1s
wait = (2 ** attempt) + random.random()
print(f"Rate limited. Retrying in {wait:.1f}s...")
time.sleep(wait)
Verification
Run this end-to-end test:
if __name__ == "__main__":
result = ask_r1_with_retry("Solve: if 2x + 3 = 11, what is x?")
assert result["answer"], "answer field is empty"
assert result["reasoning"], "reasoning field is empty — check model name"
assert "4" in result["answer"], "unexpected answer content"
print("✅ DeepSeek R1 client working correctly")
print(f"Reasoning tokens: ~{len(result['reasoning'].split())}")
print(f"Answer: {result['answer']}")
You should see:
✅ DeepSeek R1 client working correctly
Reasoning tokens: ~180
Answer: x = 4
What You Learned
deepseek-reasonerreturns two separate fields:reasoning_content(thinking) andcontent(answer) — handle both explicitly- Never include
reasoning_contentin conversation history — send onlycontentback - Streaming is essential for R1; reasoning traces routinely exceed 2,000 tokens before the answer begins
- Exponential backoff with jitter is the correct pattern for 429s on shared API infrastructure
Limitation: deepseek-reasoner does not support function calling or tool use as of March 2026. For tool-use workflows, use deepseek-chat (V3) instead and call R1 separately for reasoning-heavy subtasks.
Tested on DeepSeek API v1, openai SDK 1.68.0, Python 3.12, macOS and Ubuntu 24.04