Handle DeepSeek-V3 Streaming Responses in Node.js in 12 Minutes

Stream DeepSeek-V3 API responses in Node.js using fetch and the OpenAI SDK. Fix partial chunks, encoding issues, and buffering problems.

Problem: DeepSeek-V3 Streams Arrive as Garbled Chunks

You're calling the DeepSeek-V3 API and want real-time token streaming — but your Node.js app either waits for the full response before printing anything, crashes on partial JSON chunks, or drops tokens entirely.

You'll learn:

  • How to stream DeepSeek-V3 responses correctly using raw fetch
  • How to use the OpenAI-compatible SDK (simpler, recommended)
  • How to handle partial chunks, encoding issues, and errors mid-stream

Time: 12 min | Level: Intermediate


Why This Happens

DeepSeek-V3's API is OpenAI-compatible and uses Server-Sent Events (SSE) for streaming. Each chunk arrives as a line prefixed with data: , followed by a JSON payload. The stream ends with data: [DONE].

The problem is Node.js's fetch returns a ReadableStream of raw bytes — not lines. If you don't decode and split correctly, you'll get partial JSON, merged chunks, or silent failures.

Common symptoms:

  • Response only prints after full completion (buffering issue)
  • JSON.parse throws on data: [DONE] or partial chunks
  • First few tokens appear, then the stream hangs or drops

Solution

Step 1: Set Up Your Project

mkdir deepseek-stream && cd deepseek-stream
npm init -y
npm install openai  # Optional but recommended

Set your API key:

export DEEPSEEK_API_KEY="your-key-here"

Terminal showing successful npm install Clean install with no peer dependency warnings


Step 2: Stream with Raw Fetch (No Dependencies)

This approach works with any runtime that supports fetch.

// stream-fetch.js
async function streamDeepSeek(prompt) {
  const response = await fetch("https://api.deepseek.com/chat/completions", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "Authorization": `Bearer ${process.env.DEEPSEEK_API_KEY}`,
    },
    body: JSON.stringify({
      model: "deepseek-chat",  // DeepSeek-V3
      messages: [{ role: "user", content: prompt }],
      stream: true,            // REQUIRED: enables SSE mode
    }),
  });

  if (!response.ok) {
    throw new Error(`API error: ${response.status} ${await response.text()}`);
  }

  // Decode bytes → text using TextDecoder
  const decoder = new TextDecoder("utf-8");
  const reader = response.body.getReader();

  let buffer = "";

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    // Append decoded chunk to buffer (chunks can split mid-line)
    buffer += decoder.decode(value, { stream: true });

    // Process complete lines only
    const lines = buffer.split("\n");
    buffer = lines.pop(); // Keep the last incomplete line in the buffer

    for (const line of lines) {
      if (!line.startsWith("data: ")) continue;  // Skip empty lines, comments
      const payload = line.slice(6);              // Strip "data: " prefix
      if (payload === "[DONE]") return;           // Stream finished cleanly

      try {
        const chunk = JSON.parse(payload);
        const token = chunk.choices[0]?.delta?.content ?? "";
        process.stdout.write(token); // Print without newline for inline streaming
      } catch {
        // Malformed chunk — skip silently (can happen on network hiccups)
      }
    }
  }
}

streamDeepSeek("Explain async iterators in JavaScript in 3 sentences.")
  .then(() => console.log("\n\nDone."))
  .catch(console.error);

Expected: Tokens print one by one as they arrive, not all at once.

If it fails:

  • response.body is null: Your Node.js version is below 18. Run node --version — upgrade to Node 22+.
  • Tokens appear all at once: You're missing stream: true in the request body.
  • JSON.parse throws frequently: Check buffer handling — the lines.pop() is critical to avoid splitting mid-JSON.

Terminal streaming tokens in real time Tokens appear progressively — you should see each word arrive within milliseconds


DeepSeek's API is 100% OpenAI-compatible. The SDK handles chunking, retries, and error parsing for you.

// stream-sdk.js
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.DEEPSEEK_API_KEY,
  baseURL: "https://api.deepseek.com", // Point to DeepSeek instead of OpenAI
});

async function streamDeepSeekSDK(prompt) {
  const stream = await client.chat.completions.create({
    model: "deepseek-chat",
    messages: [{ role: "user", content: prompt }],
    stream: true,
  });

  // SDK returns an async iterable — no manual chunk parsing needed
  for await (const chunk of stream) {
    const token = chunk.choices[0]?.delta?.content ?? "";
    process.stdout.write(token);
  }

  // Flush any buffered output
  console.log("\n");
}

streamDeepSeekSDK("What is the difference between parallelism and concurrency?")
  .catch(console.error);

Run it:

node --input-type=module < stream-sdk.js
# Or add "type": "module" to package.json and run: node stream-sdk.js

Why use the SDK over raw fetch? It handles reconnects, rate limit retries, and parses [DONE] automatically. Use raw fetch only when you need zero dependencies.


Step 4: Handle Errors Mid-Stream

Streams can fail after they've started (network drop, rate limit mid-response). Catch these separately from startup errors.

async function streamWithErrorHandling(prompt) {
  let stream;

  try {
    stream = await client.chat.completions.create({
      model: "deepseek-chat",
      messages: [{ role: "user", content: prompt }],
      stream: true,
    });
  } catch (err) {
    // Startup errors: auth, bad request, network unreachable
    console.error("Failed to start stream:", err.message);
    return;
  }

  try {
    for await (const chunk of stream) {
      const token = chunk.choices[0]?.delta?.content ?? "";
      process.stdout.write(token);
    }
  } catch (err) {
    // Mid-stream errors: connection reset, timeout, server error
    console.error("\nStream interrupted:", err.message);
    // Optionally: retry from where you left off using a token counter
  }
}

If it fails:

  • APIConnectionError: DeepSeek server is unreachable. Check your network or add a retry wrapper.
  • RateLimitError mid-stream: The SDK will surface this as a thrown error in the for await loop. Log and retry after a delay.

Verification

node stream-sdk.js

You should see: Text appearing word-by-word in your Terminal within ~300ms of running the command.

Successful streaming output in terminal Tokens arrive progressively — total time to first token should be under 500ms

Run a quick latency check:

const start = Date.now();
let firstToken = true;

for await (const chunk of stream) {
  const token = chunk.choices[0]?.delta?.content ?? "";
  if (token && firstToken) {
    console.log(`\nTime to first token: ${Date.now() - start}ms`);
    firstToken = false;
  }
  process.stdout.write(token);
}

Expected: Time to first token under 800ms on a typical connection.


What You Learned

  • DeepSeek-V3 uses SSE streaming — you must buffer and split by \n, then strip data: prefix before parsing JSON
  • The { stream: true } flag in the request body is not optional — omitting it returns a single blocking response
  • The OpenAI SDK works with DeepSeek out of the box by setting baseURL — use it for production
  • Raw fetch streaming requires TextDecoder with { stream: true } to handle multi-byte characters (e.g., Chinese text) without corruption

Limitation: This guide covers text completions only. If you're streaming tool calls or function outputs, delta.tool_calls has a different incremental structure — each chunk appends to arguments as a partial JSON string.

When NOT to use streaming: For batch jobs, embeddings, or when you need the full response before acting on it (e.g., structured JSON output) — use non-streaming mode and parse choices[0].message.content directly.


Tested on Node.js 22.x, openai SDK 4.x, DeepSeek-V3 (deepseek-chat model), macOS & Ubuntu 24.04