I spent 2 weeks fighting with TensorFlow deployment configs until I figured out this exact approach.

What you'll build: A production-ready TensorFlow model API running on Kubernetes Time needed: 45 minutes (including coffee breaks) Difficulty: Beginner - I'll show you every command

Here's what makes this approach bulletproof: we containerize everything first, test it locally, then push to Kubernetes with zero surprises.

Why I Built This

I was deploying my first TensorFlow model to production and hit every possible roadblock. The model worked perfectly in Jupyter notebooks but crashed with cryptic errors in production.

My setup:

TensorFlow 2.20 trained model (image classification)
MacBook Pro M1 with 16GB RAM
Docker Desktop with Kubernetes enabled
Company requirement: everything must run on Kubernetes

What didn't work:

Copying model files directly to servers (missing dependencies everywhere)
Using generic Python containers (TensorFlow installation nightmares)
Following outdated tutorials from 2022 (breaking changes in Kubernetes)

Time wasted on wrong paths: 2 weeks of pure frustration

Step 1: Create Your TensorFlow Serving Container

The problem: TensorFlow models need specific versions of everything to work reliably.

My solution: Start with Google's official TensorFlow Serving image and add only what you need.

Time this saves: 6 hours of dependency debugging

Create your project structure:

mkdir tensorflow-k8s-deploy
cd tensorflow-k8s-deploy
mkdir model api config

What this does: Sets up a clean workspace for all our deployment files Expected output: Three empty folders ready for our files

Project structure setup in terminal My actual folder structure - yours should match exactly

Personal tip: "Keep everything in one folder - I learned this after losing model files across random directories"

Create the Dockerfile

# Use official TensorFlow Serving image
FROM tensorflow/serving:2.20.0

# Copy your trained model
COPY model/ /models/my-model/1/

# Set model name environment variable
ENV MODEL_NAME=my-model

# Expose the serving port
EXPOSE 8501

# TensorFlow Serving will auto-start with our model

What this does: Creates a container with TensorFlow Serving pre-configured for your model Expected output: A Dockerfile ready to build

Dockerfile contents in VS Code My Dockerfile in VS Code - the ENV line is crucial

Personal tip: "Version your model folders with numbers (1, 2, 3) - TensorFlow Serving uses this for hot-swapping models"

Add Your Model Files

Place your trained TensorFlow model in the model/ directory:

# Your trained model should be in SavedModel format
model/
├── saved_model.pb
├── variables/
│   ├── variables.data-00000-of-00001
│   └── variables.index
└── assets/ (if any)

What this does: Provides TensorFlow Serving with the model files it needs Expected output: Your model ready for containerization

Personal tip: "Use tf.saved_model.save() to export - it's the only format that works reliably with TensorFlow Serving"

Step 2: Build and Test Your Docker Container

The problem: You need to verify everything works before pushing to Kubernetes.

My solution: Build locally and test with real requests.

Time this saves: 2 hours of Kubernetes debugging

Build your container:

docker build -t my-tensorflow-model:v1 .

What this does: Creates your containerized TensorFlow model Expected output: Success message with image ID

Docker build output Successful Docker build - took 3 minutes on my M1 Mac

Run it locally to test:

docker run -p 8501:8501 my-tensorflow-model:v1

What this does: Starts TensorFlow Serving on port 8501 Expected output: "Exporting HTTP/REST API at:localhost:8501" message

TensorFlow Serving startup logs TensorFlow Serving ready - this message confirms everything loaded correctly

Test with a simple request:

curl -X POST http://localhost:8501/v1/models/my-model:predict \
  -H "Content-Type: application/json" \
  -d '{"instances": [[1.0, 2.0, 3.0, 4.0]]}'

What this does: Sends test data to your model Expected output: JSON response with predictions

Personal tip: "If this curl fails, fix it now - Kubernetes won't magically make it work"

Step 3: Push to Container Registry

The problem: Kubernetes needs to pull your image from somewhere accessible.

My solution: Use Docker Hub (free) or your company's registry.

Time this saves: 30 minutes of registry authentication headaches

Tag your image:

docker tag my-tensorflow-model:v1 your-username/my-tensorflow-model:v1

Push to registry:

docker push your-username/my-tensorflow-model:v1

What this does: Makes your image available to Kubernetes Expected output: Upload progress bars, then "Pushed" confirmation

Docker push to registry Docker push in progress - 2.1GB upload took 4 minutes on my connection

Personal tip: "Use specific version tags (v1, v2) instead of 'latest' - it saves you from mysterious deployment failures"

Step 4: Create Kubernetes Deployment Config

The problem: Kubernetes YAML files are notorious for tiny errors that break everything.

My solution: Start with this tested configuration and modify minimally.

Time this saves: 4 hours of YAML debugging

Create config/deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: tensorflow-model
  labels:
    app: tensorflow-model
spec:
  replicas: 2
  selector:
    matchLabels:
      app: tensorflow-model
  template:
    metadata:
      labels:
        app: tensorflow-model
    spec:
      containers:
      - name: tensorflow-serving
        image: your-username/my-tensorflow-model:v1
        ports:
        - containerPort: 8501
        resources:
          requests:
            memory: "1Gi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
        readinessProbe:
          httpGet:
            path: /v1/models/my-model
            port: 8501
          initialDelaySeconds: 30
          periodSeconds: 10
        livenessProbe:
          httpGet:
            path: /v1/models/my-model
            port: 8501
          initialDelaySeconds: 60
          periodSeconds: 30
---
apiVersion: v1
kind: Service
metadata:
  name: tensorflow-model-service
spec:
  selector:
    app: tensorflow-model
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8501
  type: LoadBalancer

What this does: Defines your TensorFlow deployment with proper health checks Expected output: A complete Kubernetes configuration

Kubernetes YAML in VS Code My deployment.yaml with proper indentation - spaces matter in YAML

Personal tip: "The readiness probe is crucial - without it, Kubernetes might send traffic to containers that aren't ready"

Step 5: Deploy to Kubernetes

The problem: First deployments usually fail due to resource issues or configuration errors.

My solution: Deploy step-by-step and verify each part works.

Time this saves: 2 hours of troubleshooting deployment failures

Apply your configuration:

kubectl apply -f config/deployment.yaml

What this does: Creates your deployment and service in Kubernetes Expected output: "deployment.apps/tensorflow-model created" messages

Kubectl apply output Successful Kubernetes deployment - both resources created

Check deployment status:

kubectl get deployments
kubectl get pods
kubectl get services

What this does: Shows you the current state of your deployment Expected output: Running pods and active service

Kubectl status check All pods running and service active - deployment successful

Wait for the LoadBalancer to get an external IP:

kubectl get service tensorflow-model-service --watch

What this does: Monitors until your service gets a public IP Expected output: IP address appears in EXTERNAL-IP column

Personal tip: "This can take 2-5 minutes depending on your cloud provider - grab coffee and wait"

Step 6: Test Your Production Deployment

The problem: You need to verify your model works exactly the same in production.

My solution: Run the same test requests against the Kubernetes service.

Time this saves: 1 hour of production debugging

Get your service IP:

kubectl get service tensorflow-model-service

Test your deployed model:

# Replace EXTERNAL-IP with your actual IP
curl -X POST http://EXTERNAL-IP/v1/models/my-model:predict \
  -H "Content-Type: application/json" \
  -d '{"instances": [[1.0, 2.0, 3.0, 4.0]]}'

What this does: Verifies your model responds correctly in production Expected output: Same JSON response as your local test

Production model response Successful production test - identical response to local testing

Check your deployment health:

kubectl describe deployment tensorflow-model
kubectl logs -l app=tensorflow-model

What this does: Shows detailed deployment status and container logs Expected output: Healthy deployment with no error logs

Personal tip: "Save these curl commands in a script - you'll use them constantly for monitoring"

What You Just Built

You now have a bulletproof TensorFlow model running on Kubernetes with automatic scaling, health checks, and load balancing. Your model can handle production traffic and automatically restart if anything fails.

Key Takeaways (Save These)

Start with Docker first: Never debug Docker and Kubernetes issues simultaneously
Use official TensorFlow Serving: Saves 90% of deployment headaches compared to custom Python APIs
Test locally before Kubernetes: If curl doesn't work locally, Kubernetes won't fix it
Version everything: Tag your Docker images and model folders with explicit versions
Add health checks: Without readiness probes, you'll get random 503 errors in production

Your Next Steps

Pick one:

Beginner: Add monitoring with Prometheus to track model performance
Intermediate: Set up auto-scaling based on CPU/memory usage
Advanced: Implement A/B testing with multiple model versions

Tools I Actually Use

Docker Desktop: Local Kubernetes testing made simple
kubectl: Official Kubernetes CLI tool
TensorFlow Serving: Google's production ML serving system
Kubernetes Documentation: Most helpful when things break

Troubleshooting Common Issues

"ImagePullBackOff" error: Your image isn't accessible. Check your registry permissions and image name spelling.

Pods stuck in "Pending": Increase your cluster resources or reduce the memory/CPU requests in your deployment.

Model returns 404 errors: Check your model name in the ENV variable matches the URL path in your requests.

Connection refused errors: Verify your service type is LoadBalancer and wait for the external IP to appear.