Build a Multi-Model Chatbot with LiteLLM in 30 Minutes

Learn to build a chatbot that switches between GPT-4, Claude, and Gemini using LiteLLM's unified API — no rewrites needed.

Problem: Your Chatbot Is Locked to One LLM

You've built a chatbot using the OpenAI SDK. Now your team wants Claude fallback support, Gemini for cost-sensitive routes, and a local Ollama model for dev. That means rewriting API calls, auth logic, and response parsing — for every new provider.

LiteLLM fixes this with a single unified interface.

You'll learn:

  • How to call 100+ LLM providers with one function
  • How to build a chatbot that hot-swaps models at runtime
  • How to add fallbacks so your app survives API outages

Time: 30 min | Level: Intermediate


Why This Happens

Every LLM provider ships a different SDK with different method signatures, auth patterns, and response shapes. Switching from openai.chat.completions.create() to Anthropic's client.messages.create() isn't just a rename — it's a different object structure, different streaming API, different error types.

LiteLLM wraps all of them behind the OpenAI spec. Your code calls litellm.completion() and never needs to know which provider is underneath.

Common symptoms this solves:

  • "We need Claude support but our codebase only speaks OpenAI"
  • Provider outages taking down your whole app
  • Dev costs blowing up because you can't easily route to cheaper models

Solution

Step 1: Install and Configure LiteLLM

pip install litellm python-dotenv

Create a .env file for your API keys:

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=...

LiteLLM reads standard env var names automatically — no extra config needed.

Step 2: Your First Multi-Model Call

import litellm
from dotenv import load_dotenv

load_dotenv()

def chat(message: str, model: str = "gpt-4o") -> str:
    response = litellm.completion(
        model=model,
        messages=[{"role": "user", "content": message}]
    )
    # Same response shape regardless of provider
    return response.choices[0].message.content

# Swap providers by changing the model string — nothing else changes
print(chat("Explain async/await in Python", model="gpt-4o"))
print(chat("Explain async/await in Python", model="claude-3-5-sonnet-20241022"))
print(chat("Explain async/await in Python", model="gemini/gemini-1.5-pro"))

Expected: All three return a response. Same code, different providers.

If it fails:

  • AuthenticationError: Check your .env key names match LiteLLM's expected format (e.g. ANTHROPIC_API_KEY, not CLAUDE_KEY)
  • ModelNotFoundError: Use LiteLLM's exact model strings — check docs.litellm.ai/docs/providers

Terminal showing successful responses from three different LLM providers All three providers return the same response structure — no parsing changes needed

Step 3: Build the Chatbot Class

This is the real payload. A stateful chatbot that holds conversation history and lets you swap the model mid-conversation.

import litellm
from dotenv import load_dotenv

load_dotenv()

class MultiModelChatbot:
    def __init__(self, model: str = "gpt-4o"):
        self.model = model
        self.history: list[dict] = []

    def switch_model(self, model: str) -> None:
        # History carries over — context is preserved across model switches
        self.model = model
        print(f"Switched to: {model}")

    def chat(self, user_message: str) -> str:
        self.history.append({"role": "user", "content": user_message})

        response = litellm.completion(
            model=self.model,
            messages=self.history,
        )

        assistant_reply = response.choices[0].message.content
        self.history.append({"role": "assistant", "content": assistant_reply})
        return assistant_reply

    def clear(self) -> None:
        self.history = []

Step 4: Add Fallbacks for Reliability

LiteLLM has built-in fallback routing. If your primary model fails, it automatically retries with the next one.

response = litellm.completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    fallbacks=["claude-3-5-sonnet-20241022", "gemini/gemini-1.5-pro"],
    # Retry the same model twice before falling back
    num_retries=2,
)

Why this matters: OpenAI has had multiple outages in 2025. With fallbacks, your app degrades gracefully instead of returning 500s.

Step 5: Wire Up a CLI Interface

def run():
    bot = MultiModelChatbot(model="gpt-4o")
    print("Multi-model chatbot ready. Commands: /switch <model>, /clear, /quit")

    while True:
        user_input = input("\nYou: ").strip()

        if not user_input:
            continue
        elif user_input == "/quit":
            break
        elif user_input == "/clear":
            bot.clear()
            print("History cleared.")
        elif user_input.startswith("/switch "):
            model = user_input.split(" ", 1)[1]
            bot.switch_model(model)
        else:
            reply = bot.chat(user_input)
            print(f"\nBot ({bot.model}): {reply}")

if __name__ == "__main__":
    run()

CLI chatbot session showing model switch mid-conversation Switching from GPT-4o to Claude mid-conversation — history persists across the switch


Verification

python chatbot.py

You should see: The prompt Multi-model chatbot ready. followed by working responses. Test the switch:

You: What's the capital of France?
Bot (gpt-4o): Paris is the capital of France.

You: /switch claude-3-5-sonnet-20241022
Switched to: claude-3-5-sonnet-20241022

You: What did I just ask you?
Bot (claude-3-5-sonnet-20241022): You asked about the capital of France.

Claude correctly sees the prior GPT-4o exchange. History works across providers.


What You Learned

  • LiteLLM gives you a single completion() call that works across 100+ providers without code changes
  • Conversation history is just a list of dicts — model-agnostic by design
  • Built-in fallbacks handle provider outages automatically
  • Limitation: Streaming responses need a small tweak (stream=True + iterate response as a generator); the pattern above is non-streaming only
  • When NOT to use this: If you need provider-specific features like OpenAI Assistants, function calling schemas differ slightly across providers — test each one

Tested on LiteLLM 1.55+, Python 3.12, macOS & Ubuntu 24.04