Share Your Local AI Model Over the Internet Securely via Ngrok

Problem: Your Local AI Model Is Stuck on Localhost

You're running a local LLM — Ollama, LM Studio, or a custom FastAPI wrapper — and need to share it with a teammate, test a webhook, or access it from your phone. But it only listens on localhost:11434 and the world can't see it.

You'll learn:

How to expose a local AI model endpoint securely via ngrok
How to lock it down with bearer token auth so strangers can't abuse it
How to make the tunnel persistent with a stable domain

Time: 20 min | Level: Intermediate

Why This Happens

Local servers bind to 127.0.0.1 by default — your loopback interface. Traffic from the internet never reaches it. Ngrok solves this by creating an encrypted tunnel from their edge servers to your machine, giving you a public HTTPS URL that proxies to your local port.

The risk: an open tunnel means anyone who finds your URL can hammer your GPU. That's why auth is non-negotiable before you share the link.

Common use cases:

Sharing a model endpoint with a remote collaborator
Calling your local LLM from a cloud-hosted app or n8n workflow
Testing mobile clients that can't hit localhost

Solution

Step 1: Install and Authenticate Ngrok

If you haven't already, install ngrok and connect your account.

# macOS
brew install ngrok

# Linux
curl -sSL https://ngrok-agent.s3.amazonaws.com/ngrok.asc \
  | sudo tee /etc/apt/trusted.gpg.d/ngrok.asc >/dev/null \
  && echo "deb https://ngrok-agent.s3.amazonaws.com buster main" \
  | sudo tee /etc/apt/sources.list.d/ngrok.list \
  && sudo apt update && sudo apt install ngrok

# Windows (via Chocolatey)
choco install ngrok

Then authenticate with your ngrok token (free at ngrok.com):

ngrok config add-authtoken YOUR_AUTHTOKEN_HERE

Expected: Authtoken saved to configuration file: ~/.config/ngrok/ngrok.yml

If it fails:

"command not found": Restart your Terminal or check your PATH
"invalid token": Copy the token fresh from your ngrok dashboard — don't include extra whitespace

Step 2: Make Sure Your Model Is Running

Before tunneling, confirm your local model is actually listening.

# For Ollama (default port 11434)
curl http://localhost:11434/api/tags

# For LM Studio (default port 1234)
curl http://localhost:1234/v1/models

# For a custom FastAPI / uvicorn server
curl http://localhost:8000/health

Expected: A JSON response listing your available models. If you get Connection refused, start your model server first.

Terminal showing Ollama responding to a local curl request Ollama returning available models on localhost — confirm this before tunneling

Step 3: Open the Tunnel

Start ngrok pointing at your model's port.

# Ollama
ngrok http 11434

# LM Studio
ngrok http 1234

# Custom server
ngrok http 8000

Expected output in the ngrok console:

Forwarding  https://a1b2-203-0-113-42.ngrok-free.app -> http://localhost:11434

Your model is now reachable at that HTTPS URL. But — stop here before sharing it. Without auth, it's wide open.

Ngrok console showing active tunnel with public HTTPS URL The Forwarding line gives you your public URL — copy it but don't share it yet

Step 4: Add Bearer Token Authentication

Create or edit ~/.config/ngrok/ngrok.yml to add an auth header policy. This rejects any request that doesn't include your secret token.

# ~/.config/ngrok/ngrok.yml
version: "3"
authtoken: YOUR_AUTHTOKEN_HERE

tunnels:
  local-ai:
    proto: http
    addr: 11434
    # Require a bearer token on all incoming requests
    traffic_policy:
      inbound:
        - actions:
            - type: restrict-ips
              config:
                enforce: true
                allow: []  # Optional: add your IP CIDR ranges here
        - expressions:
            - "req.headers['authorization'] != 'Bearer YOUR_SECRET_TOKEN'"
          actions:
            - type: deny
              config:
                status_code: 401

Generate a strong token before you paste it in:

# Generate a 32-byte hex secret
openssl rand -hex 32

Start the named tunnel:

ngrok start local-ai

Expected: Same forwarding output, but now unauthenticated requests get 401 Unauthorized.

Test it:

# Should fail
curl https://your-tunnel-url.ngrok-free.app/api/tags

# Should succeed
curl -H "Authorization: Bearer YOUR_SECRET_TOKEN" \
  https://your-tunnel-url.ngrok-free.app/api/tags

If it fails:

YAML parse error: Check indentation — YAML is whitespace-sensitive
Policy not applying: Confirm you're on ngrok's paid tier; traffic policies require a Pro plan or above

Step 5: Reserve a Stable Domain (Optional but Recommended)

Free ngrok tunnels get a random URL every restart. If you need a consistent endpoint, reserve a static domain in your ngrok dashboard, then reference it in your config:

tunnels:
  local-ai:
    proto: http
    addr: 11434
    domain: my-ai-model.ngrok.app  # Your reserved domain here
    traffic_policy:
      inbound:
        - expressions:
            - "req.headers['authorization'] != 'Bearer YOUR_SECRET_TOKEN'"
          actions:
            - type: deny
              config:
                status_code: 401

Static domains are available on ngrok's free tier — one per account.

Step 6: Call Your Tunneled Model

With the tunnel running, hit it from anywhere just like the OpenAI API.

from openai import OpenAI

client = OpenAI(
    base_url="https://your-tunnel-url.ngrok-free.app/v1",  # Your ngrok URL + /v1
    api_key="YOUR_SECRET_TOKEN",  # ngrok validates this as the bearer token
)

response = client.chat.completions.create(
    model="llama3.2",  # Whatever model you have loaded
    messages=[{"role": "user", "content": "Hello from the internet!"}],
)

print(response.choices[0].message.content)

Why this works: The OpenAI SDK sends Authorization: Bearer YOUR_SECRET_TOKEN automatically when you pass it as api_key. Ngrok's policy validates it, then forwards the request to your local Ollama or LM Studio instance.

Verification

Run a full end-to-end test from a different machine or a phone hotspot (so you're genuinely going through the internet, not localhost):

curl -X POST https://your-tunnel-url.ngrok-free.app/api/generate \
  -H "Authorization: Bearer YOUR_SECRET_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3.2", "prompt": "Say hello", "stream": false}'

You should see: A JSON response with a response field containing generated text and no errors.

Terminal showing successful curl response through ngrok tunnel A clean JSON response confirms your tunnel, auth, and model are all working correctly

What You Learned

Ngrok tunnels your local port through an encrypted connection to a public HTTPS URL
Traffic policies (bearer token auth) are essential — never share an open tunnel URL
The OpenAI SDK's api_key parameter doubles as your bearer token when pointed at ngrok
Static domains prevent your URL from changing on every restart

Limitations to know:

Ngrok free tier has bandwidth and connection limits — fine for testing, not for production load
Traffic policies require Pro plan and above; on free tier, use ngrok's basic HTTP auth (basic_auth key) as a fallback
Your model is only reachable while ngrok is running — add it to a startup script if you need persistence

When NOT to use this:

Running inference for more than a small team — use a proper cloud GPU instead
Storing sensitive data in prompts — your traffic transits ngrok's servers
Production deployments — this is a development and collaboration tool

Tested with Ollama 0.5.x, LM Studio 0.3.x, ngrok 3.x on macOS Sequoia and Ubuntu 24.04