Share Your Local AI Model Over the Internet Securely via Ngrok

Expose your local LLM (Ollama, LM Studio, etc.) over the internet with ngrok tunnels, auth tokens, and HTTPS in under 20 minutes.

Problem: Your Local AI Model Is Stuck on Localhost

You're running a local LLM — Ollama, LM Studio, or a custom FastAPI wrapper — and need to share it with a teammate, test a webhook, or access it from your phone. But it only listens on localhost:11434 and the world can't see it.

You'll learn:

  • How to expose a local AI model endpoint securely via ngrok
  • How to lock it down with bearer token auth so strangers can't abuse it
  • How to make the tunnel persistent with a stable domain

Time: 20 min | Level: Intermediate


Why This Happens

Local servers bind to 127.0.0.1 by default — your loopback interface. Traffic from the internet never reaches it. Ngrok solves this by creating an encrypted tunnel from their edge servers to your machine, giving you a public HTTPS URL that proxies to your local port.

The risk: an open tunnel means anyone who finds your URL can hammer your GPU. That's why auth is non-negotiable before you share the link.

Common use cases:

  • Sharing a model endpoint with a remote collaborator
  • Calling your local LLM from a cloud-hosted app or n8n workflow
  • Testing mobile clients that can't hit localhost

Solution

Step 1: Install and Authenticate Ngrok

If you haven't already, install ngrok and connect your account.

# macOS
brew install ngrok

# Linux
curl -sSL https://ngrok-agent.s3.amazonaws.com/ngrok.asc \
  | sudo tee /etc/apt/trusted.gpg.d/ngrok.asc >/dev/null \
  && echo "deb https://ngrok-agent.s3.amazonaws.com buster main" \
  | sudo tee /etc/apt/sources.list.d/ngrok.list \
  && sudo apt update && sudo apt install ngrok

# Windows (via Chocolatey)
choco install ngrok

Then authenticate with your ngrok token (free at ngrok.com):

ngrok config add-authtoken YOUR_AUTHTOKEN_HERE

Expected: Authtoken saved to configuration file: ~/.config/ngrok/ngrok.yml

If it fails:

  • "command not found": Restart your Terminal or check your PATH
  • "invalid token": Copy the token fresh from your ngrok dashboard — don't include extra whitespace

Step 2: Make Sure Your Model Is Running

Before tunneling, confirm your local model is actually listening.

# For Ollama (default port 11434)
curl http://localhost:11434/api/tags

# For LM Studio (default port 1234)
curl http://localhost:1234/v1/models

# For a custom FastAPI / uvicorn server
curl http://localhost:8000/health

Expected: A JSON response listing your available models. If you get Connection refused, start your model server first.

Terminal showing Ollama responding to a local curl request Ollama returning available models on localhost — confirm this before tunneling


Step 3: Open the Tunnel

Start ngrok pointing at your model's port.

# Ollama
ngrok http 11434

# LM Studio
ngrok http 1234

# Custom server
ngrok http 8000

Expected output in the ngrok console:

Forwarding  https://a1b2-203-0-113-42.ngrok-free.app -> http://localhost:11434

Your model is now reachable at that HTTPS URL. But — stop here before sharing it. Without auth, it's wide open.

Ngrok console showing active tunnel with public HTTPS URL The Forwarding line gives you your public URL — copy it but don't share it yet


Step 4: Add Bearer Token Authentication

Create or edit ~/.config/ngrok/ngrok.yml to add an auth header policy. This rejects any request that doesn't include your secret token.

# ~/.config/ngrok/ngrok.yml
version: "3"
authtoken: YOUR_AUTHTOKEN_HERE

tunnels:
  local-ai:
    proto: http
    addr: 11434
    # Require a bearer token on all incoming requests
    traffic_policy:
      inbound:
        - actions:
            - type: restrict-ips
              config:
                enforce: true
                allow: []  # Optional: add your IP CIDR ranges here
        - expressions:
            - "req.headers['authorization'] != 'Bearer YOUR_SECRET_TOKEN'"
          actions:
            - type: deny
              config:
                status_code: 401

Generate a strong token before you paste it in:

# Generate a 32-byte hex secret
openssl rand -hex 32

Start the named tunnel:

ngrok start local-ai

Expected: Same forwarding output, but now unauthenticated requests get 401 Unauthorized.

Test it:

# Should fail
curl https://your-tunnel-url.ngrok-free.app/api/tags

# Should succeed
curl -H "Authorization: Bearer YOUR_SECRET_TOKEN" \
  https://your-tunnel-url.ngrok-free.app/api/tags

If it fails:

  • YAML parse error: Check indentation — YAML is whitespace-sensitive
  • Policy not applying: Confirm you're on ngrok's paid tier; traffic policies require a Pro plan or above

Free ngrok tunnels get a random URL every restart. If you need a consistent endpoint, reserve a static domain in your ngrok dashboard, then reference it in your config:

tunnels:
  local-ai:
    proto: http
    addr: 11434
    domain: my-ai-model.ngrok.app  # Your reserved domain here
    traffic_policy:
      inbound:
        - expressions:
            - "req.headers['authorization'] != 'Bearer YOUR_SECRET_TOKEN'"
          actions:
            - type: deny
              config:
                status_code: 401

Static domains are available on ngrok's free tier — one per account.


Step 6: Call Your Tunneled Model

With the tunnel running, hit it from anywhere just like the OpenAI API.

from openai import OpenAI

client = OpenAI(
    base_url="https://your-tunnel-url.ngrok-free.app/v1",  # Your ngrok URL + /v1
    api_key="YOUR_SECRET_TOKEN",  # ngrok validates this as the bearer token
)

response = client.chat.completions.create(
    model="llama3.2",  # Whatever model you have loaded
    messages=[{"role": "user", "content": "Hello from the internet!"}],
)

print(response.choices[0].message.content)

Why this works: The OpenAI SDK sends Authorization: Bearer YOUR_SECRET_TOKEN automatically when you pass it as api_key. Ngrok's policy validates it, then forwards the request to your local Ollama or LM Studio instance.


Verification

Run a full end-to-end test from a different machine or a phone hotspot (so you're genuinely going through the internet, not localhost):

curl -X POST https://your-tunnel-url.ngrok-free.app/api/generate \
  -H "Authorization: Bearer YOUR_SECRET_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3.2", "prompt": "Say hello", "stream": false}'

You should see: A JSON response with a response field containing generated text and no errors.

Terminal showing successful curl response through ngrok tunnel A clean JSON response confirms your tunnel, auth, and model are all working correctly


What You Learned

  • Ngrok tunnels your local port through an encrypted connection to a public HTTPS URL
  • Traffic policies (bearer token auth) are essential — never share an open tunnel URL
  • The OpenAI SDK's api_key parameter doubles as your bearer token when pointed at ngrok
  • Static domains prevent your URL from changing on every restart

Limitations to know:

  • Ngrok free tier has bandwidth and connection limits — fine for testing, not for production load
  • Traffic policies require Pro plan and above; on free tier, use ngrok's basic HTTP auth (basic_auth key) as a fallback
  • Your model is only reachable while ngrok is running — add it to a startup script if you need persistence

When NOT to use this:

  • Running inference for more than a small team — use a proper cloud GPU instead
  • Storing sensitive data in prompts — your traffic transits ngrok's servers
  • Production deployments — this is a development and collaboration tool

Tested with Ollama 0.5.x, LM Studio 0.3.x, ngrok 3.x on macOS Sequoia and Ubuntu 24.04