Problem: DeepSeek-V3 Streams Arrive as Garbled Chunks
You're calling the DeepSeek-V3 API and want real-time token streaming — but your Node.js app either waits for the full response before printing anything, crashes on partial JSON chunks, or drops tokens entirely.
You'll learn:
- How to stream DeepSeek-V3 responses correctly using raw
fetch - How to use the OpenAI-compatible SDK (simpler, recommended)
- How to handle partial chunks, encoding issues, and errors mid-stream
Time: 12 min | Level: Intermediate
Why This Happens
DeepSeek-V3's API is OpenAI-compatible and uses Server-Sent Events (SSE) for streaming. Each chunk arrives as a line prefixed with data: , followed by a JSON payload. The stream ends with data: [DONE].
The problem is Node.js's fetch returns a ReadableStream of raw bytes — not lines. If you don't decode and split correctly, you'll get partial JSON, merged chunks, or silent failures.
Common symptoms:
- Response only prints after full completion (buffering issue)
JSON.parsethrows ondata: [DONE]or partial chunks- First few tokens appear, then the stream hangs or drops
Solution
Step 1: Set Up Your Project
mkdir deepseek-stream && cd deepseek-stream
npm init -y
npm install openai # Optional but recommended
Set your API key:
export DEEPSEEK_API_KEY="your-key-here"
Clean install with no peer dependency warnings
Step 2: Stream with Raw Fetch (No Dependencies)
This approach works with any runtime that supports fetch.
// stream-fetch.js
async function streamDeepSeek(prompt) {
const response = await fetch("https://api.deepseek.com/chat/completions", {
method: "POST",
headers: {
"Content-Type": "application/json",
"Authorization": `Bearer ${process.env.DEEPSEEK_API_KEY}`,
},
body: JSON.stringify({
model: "deepseek-chat", // DeepSeek-V3
messages: [{ role: "user", content: prompt }],
stream: true, // REQUIRED: enables SSE mode
}),
});
if (!response.ok) {
throw new Error(`API error: ${response.status} ${await response.text()}`);
}
// Decode bytes → text using TextDecoder
const decoder = new TextDecoder("utf-8");
const reader = response.body.getReader();
let buffer = "";
while (true) {
const { done, value } = await reader.read();
if (done) break;
// Append decoded chunk to buffer (chunks can split mid-line)
buffer += decoder.decode(value, { stream: true });
// Process complete lines only
const lines = buffer.split("\n");
buffer = lines.pop(); // Keep the last incomplete line in the buffer
for (const line of lines) {
if (!line.startsWith("data: ")) continue; // Skip empty lines, comments
const payload = line.slice(6); // Strip "data: " prefix
if (payload === "[DONE]") return; // Stream finished cleanly
try {
const chunk = JSON.parse(payload);
const token = chunk.choices[0]?.delta?.content ?? "";
process.stdout.write(token); // Print without newline for inline streaming
} catch {
// Malformed chunk — skip silently (can happen on network hiccups)
}
}
}
}
streamDeepSeek("Explain async iterators in JavaScript in 3 sentences.")
.then(() => console.log("\n\nDone."))
.catch(console.error);
Expected: Tokens print one by one as they arrive, not all at once.
If it fails:
response.bodyis null: Your Node.js version is below 18. Runnode --version— upgrade to Node 22+.- Tokens appear all at once: You're missing
stream: truein the request body. JSON.parsethrows frequently: Checkbufferhandling — thelines.pop()is critical to avoid splitting mid-JSON.
Tokens appear progressively — you should see each word arrive within milliseconds
Step 3: Stream with the OpenAI SDK (Recommended)
DeepSeek's API is 100% OpenAI-compatible. The SDK handles chunking, retries, and error parsing for you.
// stream-sdk.js
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.DEEPSEEK_API_KEY,
baseURL: "https://api.deepseek.com", // Point to DeepSeek instead of OpenAI
});
async function streamDeepSeekSDK(prompt) {
const stream = await client.chat.completions.create({
model: "deepseek-chat",
messages: [{ role: "user", content: prompt }],
stream: true,
});
// SDK returns an async iterable — no manual chunk parsing needed
for await (const chunk of stream) {
const token = chunk.choices[0]?.delta?.content ?? "";
process.stdout.write(token);
}
// Flush any buffered output
console.log("\n");
}
streamDeepSeekSDK("What is the difference between parallelism and concurrency?")
.catch(console.error);
Run it:
node --input-type=module < stream-sdk.js
# Or add "type": "module" to package.json and run: node stream-sdk.js
Why use the SDK over raw fetch? It handles reconnects, rate limit retries, and parses [DONE] automatically. Use raw fetch only when you need zero dependencies.
Step 4: Handle Errors Mid-Stream
Streams can fail after they've started (network drop, rate limit mid-response). Catch these separately from startup errors.
async function streamWithErrorHandling(prompt) {
let stream;
try {
stream = await client.chat.completions.create({
model: "deepseek-chat",
messages: [{ role: "user", content: prompt }],
stream: true,
});
} catch (err) {
// Startup errors: auth, bad request, network unreachable
console.error("Failed to start stream:", err.message);
return;
}
try {
for await (const chunk of stream) {
const token = chunk.choices[0]?.delta?.content ?? "";
process.stdout.write(token);
}
} catch (err) {
// Mid-stream errors: connection reset, timeout, server error
console.error("\nStream interrupted:", err.message);
// Optionally: retry from where you left off using a token counter
}
}
If it fails:
APIConnectionError: DeepSeek server is unreachable. Check your network or add a retry wrapper.RateLimitErrormid-stream: The SDK will surface this as a thrown error in thefor awaitloop. Log and retry after a delay.
Verification
node stream-sdk.js
You should see: Text appearing word-by-word in your Terminal within ~300ms of running the command.
Tokens arrive progressively — total time to first token should be under 500ms
Run a quick latency check:
const start = Date.now();
let firstToken = true;
for await (const chunk of stream) {
const token = chunk.choices[0]?.delta?.content ?? "";
if (token && firstToken) {
console.log(`\nTime to first token: ${Date.now() - start}ms`);
firstToken = false;
}
process.stdout.write(token);
}
Expected: Time to first token under 800ms on a typical connection.
What You Learned
- DeepSeek-V3 uses SSE streaming — you must buffer and split by
\n, then stripdata:prefix before parsing JSON - The
{ stream: true }flag in the request body is not optional — omitting it returns a single blocking response - The OpenAI SDK works with DeepSeek out of the box by setting
baseURL— use it for production - Raw
fetchstreaming requiresTextDecoderwith{ stream: true }to handle multi-byte characters (e.g., Chinese text) without corruption
Limitation: This guide covers text completions only. If you're streaming tool calls or function outputs, delta.tool_calls has a different incremental structure — each chunk appends to arguments as a partial JSON string.
When NOT to use streaming: For batch jobs, embeddings, or when you need the full response before acting on it (e.g., structured JSON output) — use non-streaming mode and parse choices[0].message.content directly.
Tested on Node.js 22.x, openai SDK 4.x, DeepSeek-V3 (deepseek-chat model), macOS & Ubuntu 24.04