Problem: Your Support Bot Only Speaks One Language
Your customer support bot works fine in English — but 40% of your users write in Spanish, French, Arabic, or Mandarin. Right now, they either get gibberish back or no response at all.
You'll learn:
- How to detect incoming message language automatically
- How to route, translate, and respond without losing context
- How to deploy this with FastAPI and keep latency under 300ms
Time: 45 min | Level: Intermediate
Why This Happens
Most support bots are built on single-language embeddings and prompts. When a non-English message comes in, the vector search returns irrelevant results and the LLM either hallucinates or falls back to English — neither of which helps the user.
The fix is a three-layer pipeline: detect the language, translate to a working language for retrieval, generate the response, then translate back.
Common symptoms:
- Bot responds in English to non-English queries
- Vector search returns empty or mismatched results
- Users churn after first non-English message fails
Solution
Step 1: Set Up the Project
pip install fastapi uvicorn langdetect deep-translator openai chromadb
Create your project structure:
support-bot/
├── main.py
├── pipeline.py
├── kb_loader.py
└── requirements.txt
Step 2: Detect the Incoming Language
# pipeline.py
from langdetect import detect, LangDetectException
SUPPORTED_LANGUAGES = {"en", "es", "fr", "de", "ar", "zh-cn", "pt", "ja"}
def detect_language(text: str) -> str:
try:
lang = detect(text)
# langdetect returns zh-cn for simplified Chinese
return lang if lang in SUPPORTED_LANGUAGES else "en"
except LangDetectException:
# Fall back to English if detection fails (very short strings)
return "en"
Expected: Returns an ISO 639-1 code like "es" or "fr".
If it fails:
- LangDetectException on short strings: Add a minimum length check — anything under 10 characters defaults to
"en". - Wrong detection on mixed-language input: Use
detect_langs()instead and pick the highest-probability result.
Step 3: Translate to English for Retrieval
Your knowledge base is in English. Translate the query before searching — don't translate the knowledge base itself.
# pipeline.py
from deep_translator import GoogleTranslator
def to_english(text: str, source_lang: str) -> str:
if source_lang == "en":
return text # Skip the API call entirely
return GoogleTranslator(
source=source_lang,
target="en"
).translate(text)
def from_english(text: str, target_lang: str) -> str:
if target_lang == "en":
return text
return GoogleTranslator(
source="en",
target=target_lang
).translate(text)
Why translate to English first: Most embedding models (OpenAI, Sentence Transformers) perform significantly better on English text. Translating the query gives you better retrieval than multilingual embeddings alone.
Step 4: Build the Retrieval-Augmented Pipeline
# pipeline.py
import chromadb
from openai import OpenAI
client = OpenAI()
chroma = chromadb.Client()
collection = chroma.get_or_create_collection("support_kb")
def get_relevant_docs(query_en: str, n_results: int = 3) -> list[str]:
results = collection.query(
query_texts=[query_en],
n_results=n_results
)
return results["documents"][0] # List of matching KB articles
def generate_response(query_en: str, context_docs: list[str]) -> str:
context = "\n\n".join(context_docs)
response = client.chat.completions.create(
model="gpt-4o-mini", # Fast and cheap for support use cases
messages=[
{
"role": "system",
"content": (
"You are a helpful customer support agent. "
"Answer using only the provided context. "
"If the answer isn't in the context, say so clearly. "
"Keep answers concise — 2-4 sentences max.\n\n"
f"Context:\n{context}"
)
},
{"role": "user", "content": query_en}
],
max_tokens=300,
temperature=0.2 # Low temperature = consistent, factual responses
)
return response.choices[0].message.content
Step 5: Wire the Full Pipeline
# pipeline.py
def handle_message(user_message: str) -> dict:
# 1. Detect language
lang = detect_language(user_message)
# 2. Translate to English
query_en = to_english(user_message, lang)
# 3. Retrieve relevant docs
docs = get_relevant_docs(query_en)
# 4. Generate response in English
response_en = generate_response(query_en, docs)
# 5. Translate response back to user's language
response_local = from_english(response_en, lang)
return {
"response": response_local,
"detected_language": lang,
"sources_found": len(docs)
}
Step 6: Expose It with FastAPI
# main.py
from fastapi import FastAPI
from pydantic import BaseModel
from pipeline import handle_message
app = FastAPI()
class MessageRequest(BaseModel):
message: str
session_id: str | None = None # Optional for future conversation history
@app.post("/chat")
async def chat(request: MessageRequest):
result = handle_message(request.message)
return result
Run it:
uvicorn main:app --reload --port 8000
Step 7: Load Your Knowledge Base
# kb_loader.py
import chromadb
import json
def load_kb_from_json(filepath: str):
chroma = chromadb.Client()
collection = chroma.get_or_create_collection("support_kb")
with open(filepath) as f:
articles = json.load(f) # [{"id": "1", "content": "..."}]
collection.add(
documents=[a["content"] for a in articles],
ids=[a["id"] for a in articles]
)
print(f"Loaded {len(articles)} articles into ChromaDB")
if __name__ == "__main__":
load_kb_from_json("kb_articles.json")
Run once before starting the server:
python kb_loader.py
Verification
Test it:
# English
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"message": "How do I reset my password?"}'
# Spanish
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"message": "¿Cómo puedo restablecer mi contraseña?"}'
You should see: Both return responses in the message's original language with "sources_found": 2 or more.
If sources_found is 0: Your knowledge base isn't loaded or the query is too far from your KB content. Check ChromaDB has articles with collection.count().
What You Learned
- Language detection + translate-then-retrieve gives you better results than multilingual embeddings for most KB sizes
deep-translatorwraps Google Translate for free at low volume — switch to DeepL or Azure Translator in production for reliability- Keeping the LLM prompt in English and only translating output keeps response quality high across all languages
Limitation: langdetect is probabilistic. For very short messages (under 10 words), accuracy drops to ~80%. Consider asking users to confirm their language on first contact in high-stakes flows.
When NOT to use this approach: If your KB has 50,000+ articles in multiple languages natively, skip the translation layer and use a multilingual embedding model like paraphrase-multilingual-mpnet-base-v2 directly — it'll be faster and cheaper.
Tested on Python 3.12, FastAPI 0.115, ChromaDB 0.5, deep-translator 1.11 — macOS & Ubuntu