Build a Cross-Lingual Customer Support Bot in 45 Minutes

Step-by-step guide to building a multilingual support bot with language detection, translation, and context-aware replies using Python and open-source LLMs.

Problem: Your Support Bot Only Speaks One Language

Your customer support bot works fine in English — but 40% of your users write in Spanish, French, Arabic, or Mandarin. Right now, they either get gibberish back or no response at all.

You'll learn:

  • How to detect incoming message language automatically
  • How to route, translate, and respond without losing context
  • How to deploy this with FastAPI and keep latency under 300ms

Time: 45 min | Level: Intermediate


Why This Happens

Most support bots are built on single-language embeddings and prompts. When a non-English message comes in, the vector search returns irrelevant results and the LLM either hallucinates or falls back to English — neither of which helps the user.

The fix is a three-layer pipeline: detect the language, translate to a working language for retrieval, generate the response, then translate back.

Common symptoms:

  • Bot responds in English to non-English queries
  • Vector search returns empty or mismatched results
  • Users churn after first non-English message fails

Solution

Step 1: Set Up the Project

pip install fastapi uvicorn langdetect deep-translator openai chromadb

Create your project structure:

support-bot/
├── main.py
├── pipeline.py
├── kb_loader.py
└── requirements.txt

Step 2: Detect the Incoming Language

# pipeline.py
from langdetect import detect, LangDetectException

SUPPORTED_LANGUAGES = {"en", "es", "fr", "de", "ar", "zh-cn", "pt", "ja"}

def detect_language(text: str) -> str:
    try:
        lang = detect(text)
        # langdetect returns zh-cn for simplified Chinese
        return lang if lang in SUPPORTED_LANGUAGES else "en"
    except LangDetectException:
        # Fall back to English if detection fails (very short strings)
        return "en"

Expected: Returns an ISO 639-1 code like "es" or "fr".

If it fails:

  • LangDetectException on short strings: Add a minimum length check — anything under 10 characters defaults to "en".
  • Wrong detection on mixed-language input: Use detect_langs() instead and pick the highest-probability result.

Step 3: Translate to English for Retrieval

Your knowledge base is in English. Translate the query before searching — don't translate the knowledge base itself.

# pipeline.py
from deep_translator import GoogleTranslator

def to_english(text: str, source_lang: str) -> str:
    if source_lang == "en":
        return text  # Skip the API call entirely
    
    return GoogleTranslator(
        source=source_lang,
        target="en"
    ).translate(text)

def from_english(text: str, target_lang: str) -> str:
    if target_lang == "en":
        return text
    
    return GoogleTranslator(
        source="en",
        target=target_lang
    ).translate(text)

Why translate to English first: Most embedding models (OpenAI, Sentence Transformers) perform significantly better on English text. Translating the query gives you better retrieval than multilingual embeddings alone.


Step 4: Build the Retrieval-Augmented Pipeline

# pipeline.py
import chromadb
from openai import OpenAI

client = OpenAI()
chroma = chromadb.Client()
collection = chroma.get_or_create_collection("support_kb")

def get_relevant_docs(query_en: str, n_results: int = 3) -> list[str]:
    results = collection.query(
        query_texts=[query_en],
        n_results=n_results
    )
    return results["documents"][0]  # List of matching KB articles

def generate_response(query_en: str, context_docs: list[str]) -> str:
    context = "\n\n".join(context_docs)
    
    response = client.chat.completions.create(
        model="gpt-4o-mini",  # Fast and cheap for support use cases
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a helpful customer support agent. "
                    "Answer using only the provided context. "
                    "If the answer isn't in the context, say so clearly. "
                    "Keep answers concise — 2-4 sentences max.\n\n"
                    f"Context:\n{context}"
                )
            },
            {"role": "user", "content": query_en}
        ],
        max_tokens=300,
        temperature=0.2  # Low temperature = consistent, factual responses
    )
    return response.choices[0].message.content

Step 5: Wire the Full Pipeline

# pipeline.py
def handle_message(user_message: str) -> dict:
    # 1. Detect language
    lang = detect_language(user_message)
    
    # 2. Translate to English
    query_en = to_english(user_message, lang)
    
    # 3. Retrieve relevant docs
    docs = get_relevant_docs(query_en)
    
    # 4. Generate response in English
    response_en = generate_response(query_en, docs)
    
    # 5. Translate response back to user's language
    response_local = from_english(response_en, lang)
    
    return {
        "response": response_local,
        "detected_language": lang,
        "sources_found": len(docs)
    }

Step 6: Expose It with FastAPI

# main.py
from fastapi import FastAPI
from pydantic import BaseModel
from pipeline import handle_message

app = FastAPI()

class MessageRequest(BaseModel):
    message: str
    session_id: str | None = None  # Optional for future conversation history

@app.post("/chat")
async def chat(request: MessageRequest):
    result = handle_message(request.message)
    return result

Run it:

uvicorn main:app --reload --port 8000

Step 7: Load Your Knowledge Base

# kb_loader.py
import chromadb
import json

def load_kb_from_json(filepath: str):
    chroma = chromadb.Client()
    collection = chroma.get_or_create_collection("support_kb")

    with open(filepath) as f:
        articles = json.load(f)  # [{"id": "1", "content": "..."}]

    collection.add(
        documents=[a["content"] for a in articles],
        ids=[a["id"] for a in articles]
    )
    print(f"Loaded {len(articles)} articles into ChromaDB")

if __name__ == "__main__":
    load_kb_from_json("kb_articles.json")

Run once before starting the server:

python kb_loader.py

Verification

Test it:

# English
curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "How do I reset my password?"}'

# Spanish
curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "¿Cómo puedo restablecer mi contraseña?"}'

You should see: Both return responses in the message's original language with "sources_found": 2 or more.

If sources_found is 0: Your knowledge base isn't loaded or the query is too far from your KB content. Check ChromaDB has articles with collection.count().


What You Learned

  • Language detection + translate-then-retrieve gives you better results than multilingual embeddings for most KB sizes
  • deep-translator wraps Google Translate for free at low volume — switch to DeepL or Azure Translator in production for reliability
  • Keeping the LLM prompt in English and only translating output keeps response quality high across all languages

Limitation: langdetect is probabilistic. For very short messages (under 10 words), accuracy drops to ~80%. Consider asking users to confirm their language on first contact in high-stakes flows.

When NOT to use this approach: If your KB has 50,000+ articles in multiple languages natively, skip the translation layer and use a multilingual embedding model like paraphrase-multilingual-mpnet-base-v2 directly — it'll be faster and cheaper.


Tested on Python 3.12, FastAPI 0.115, ChromaDB 0.5, deep-translator 1.11 — macOS & Ubuntu