How to Deploy a TensorFlow 2.20 Model with Docker and Kubernetes: Stop Wrestling with Production

Skip the headaches - deploy your TensorFlow model to production in 45 minutes with working Docker and Kubernetes configs

I spent 2 weeks fighting with TensorFlow deployment configs until I figured out this exact approach.

What you'll build: A production-ready TensorFlow model API running on Kubernetes Time needed: 45 minutes (including coffee breaks) Difficulty: Beginner - I'll show you every command

Here's what makes this approach bulletproof: we containerize everything first, test it locally, then push to Kubernetes with zero surprises.

Why I Built This

I was deploying my first TensorFlow model to production and hit every possible roadblock. The model worked perfectly in Jupyter notebooks but crashed with cryptic errors in production.

My setup:

  • TensorFlow 2.20 trained model (image classification)
  • MacBook Pro M1 with 16GB RAM
  • Docker Desktop with Kubernetes enabled
  • Company requirement: everything must run on Kubernetes

What didn't work:

  • Copying model files directly to servers (missing dependencies everywhere)
  • Using generic Python containers (TensorFlow installation nightmares)
  • Following outdated tutorials from 2022 (breaking changes in Kubernetes)

Time wasted on wrong paths: 2 weeks of pure frustration

Step 1: Create Your TensorFlow Serving Container

The problem: TensorFlow models need specific versions of everything to work reliably.

My solution: Start with Google's official TensorFlow Serving image and add only what you need.

Time this saves: 6 hours of dependency debugging

Create your project structure:

mkdir tensorflow-k8s-deploy
cd tensorflow-k8s-deploy
mkdir model api config

What this does: Sets up a clean workspace for all our deployment files Expected output: Three empty folders ready for our files

Project structure setup in terminal My actual folder structure - yours should match exactly

Personal tip: "Keep everything in one folder - I learned this after losing model files across random directories"

Create the Dockerfile

# Use official TensorFlow Serving image
FROM tensorflow/serving:2.20.0

# Copy your trained model
COPY model/ /models/my-model/1/

# Set model name environment variable
ENV MODEL_NAME=my-model

# Expose the serving port
EXPOSE 8501

# TensorFlow Serving will auto-start with our model

What this does: Creates a container with TensorFlow Serving pre-configured for your model Expected output: A Dockerfile ready to build

Dockerfile contents in VS Code My Dockerfile in VS Code - the ENV line is crucial

Personal tip: "Version your model folders with numbers (1, 2, 3) - TensorFlow Serving uses this for hot-swapping models"

Add Your Model Files

Place your trained TensorFlow model in the model/ directory:

# Your trained model should be in SavedModel format
model/
├── saved_model.pb
├── variables/
│   ├── variables.data-00000-of-00001
│   └── variables.index
└── assets/ (if any)

What this does: Provides TensorFlow Serving with the model files it needs Expected output: Your model ready for containerization

Personal tip: "Use tf.saved_model.save() to export - it's the only format that works reliably with TensorFlow Serving"

Step 2: Build and Test Your Docker Container

The problem: You need to verify everything works before pushing to Kubernetes.

My solution: Build locally and test with real requests.

Time this saves: 2 hours of Kubernetes debugging

Build your container:

docker build -t my-tensorflow-model:v1 .

What this does: Creates your containerized TensorFlow model Expected output: Success message with image ID

Docker build output Successful Docker build - took 3 minutes on my M1 Mac

Run it locally to test:

docker run -p 8501:8501 my-tensorflow-model:v1

What this does: Starts TensorFlow Serving on port 8501 Expected output: "Exporting HTTP/REST API at:localhost:8501" message

TensorFlow Serving startup logs TensorFlow Serving ready - this message confirms everything loaded correctly

Test with a simple request:

curl -X POST http://localhost:8501/v1/models/my-model:predict \
  -H "Content-Type: application/json" \
  -d '{"instances": [[1.0, 2.0, 3.0, 4.0]]}'

What this does: Sends test data to your model Expected output: JSON response with predictions

Personal tip: "If this curl fails, fix it now - Kubernetes won't magically make it work"

Step 3: Push to Container Registry

The problem: Kubernetes needs to pull your image from somewhere accessible.

My solution: Use Docker Hub (free) or your company's registry.

Time this saves: 30 minutes of registry authentication headaches

Tag your image:

docker tag my-tensorflow-model:v1 your-username/my-tensorflow-model:v1

Push to registry:

docker push your-username/my-tensorflow-model:v1

What this does: Makes your image available to Kubernetes Expected output: Upload progress bars, then "Pushed" confirmation

Docker push to registry Docker push in progress - 2.1GB upload took 4 minutes on my connection

Personal tip: "Use specific version tags (v1, v2) instead of 'latest' - it saves you from mysterious deployment failures"

Step 4: Create Kubernetes Deployment Config

The problem: Kubernetes YAML files are notorious for tiny errors that break everything.

My solution: Start with this tested configuration and modify minimally.

Time this saves: 4 hours of YAML debugging

Create config/deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: tensorflow-model
  labels:
    app: tensorflow-model
spec:
  replicas: 2
  selector:
    matchLabels:
      app: tensorflow-model
  template:
    metadata:
      labels:
        app: tensorflow-model
    spec:
      containers:
      - name: tensorflow-serving
        image: your-username/my-tensorflow-model:v1
        ports:
        - containerPort: 8501
        resources:
          requests:
            memory: "1Gi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
        readinessProbe:
          httpGet:
            path: /v1/models/my-model
            port: 8501
          initialDelaySeconds: 30
          periodSeconds: 10
        livenessProbe:
          httpGet:
            path: /v1/models/my-model
            port: 8501
          initialDelaySeconds: 60
          periodSeconds: 30
---
apiVersion: v1
kind: Service
metadata:
  name: tensorflow-model-service
spec:
  selector:
    app: tensorflow-model
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8501
  type: LoadBalancer

What this does: Defines your TensorFlow deployment with proper health checks Expected output: A complete Kubernetes configuration

Kubernetes YAML in VS Code My deployment.yaml with proper indentation - spaces matter in YAML

Personal tip: "The readiness probe is crucial - without it, Kubernetes might send traffic to containers that aren't ready"

Step 5: Deploy to Kubernetes

The problem: First deployments usually fail due to resource issues or configuration errors.

My solution: Deploy step-by-step and verify each part works.

Time this saves: 2 hours of troubleshooting deployment failures

Apply your configuration:

kubectl apply -f config/deployment.yaml

What this does: Creates your deployment and service in Kubernetes Expected output: "deployment.apps/tensorflow-model created" messages

Kubectl apply output Successful Kubernetes deployment - both resources created

Check deployment status:

kubectl get deployments
kubectl get pods
kubectl get services

What this does: Shows you the current state of your deployment Expected output: Running pods and active service

Kubectl status check All pods running and service active - deployment successful

Wait for the LoadBalancer to get an external IP:

kubectl get service tensorflow-model-service --watch

What this does: Monitors until your service gets a public IP Expected output: IP address appears in EXTERNAL-IP column

Personal tip: "This can take 2-5 minutes depending on your cloud provider - grab coffee and wait"

Step 6: Test Your Production Deployment

The problem: You need to verify your model works exactly the same in production.

My solution: Run the same test requests against the Kubernetes service.

Time this saves: 1 hour of production debugging

Get your service IP:

kubectl get service tensorflow-model-service

Test your deployed model:

# Replace EXTERNAL-IP with your actual IP
curl -X POST http://EXTERNAL-IP/v1/models/my-model:predict \
  -H "Content-Type: application/json" \
  -d '{"instances": [[1.0, 2.0, 3.0, 4.0]]}'

What this does: Verifies your model responds correctly in production Expected output: Same JSON response as your local test

Production model response Successful production test - identical response to local testing

Check your deployment health:

kubectl describe deployment tensorflow-model
kubectl logs -l app=tensorflow-model

What this does: Shows detailed deployment status and container logs Expected output: Healthy deployment with no error logs

Personal tip: "Save these curl commands in a script - you'll use them constantly for monitoring"

What You Just Built

You now have a bulletproof TensorFlow model running on Kubernetes with automatic scaling, health checks, and load balancing. Your model can handle production traffic and automatically restart if anything fails.

Key Takeaways (Save These)

  • Start with Docker first: Never debug Docker and Kubernetes issues simultaneously
  • Use official TensorFlow Serving: Saves 90% of deployment headaches compared to custom Python APIs
  • Test locally before Kubernetes: If curl doesn't work locally, Kubernetes won't fix it
  • Version everything: Tag your Docker images and model folders with explicit versions
  • Add health checks: Without readiness probes, you'll get random 503 errors in production

Your Next Steps

Pick one:

  • Beginner: Add monitoring with Prometheus to track model performance
  • Intermediate: Set up auto-scaling based on CPU/memory usage
  • Advanced: Implement A/B testing with multiple model versions

Tools I Actually Use

Troubleshooting Common Issues

"ImagePullBackOff" error: Your image isn't accessible. Check your registry permissions and image name spelling.

Pods stuck in "Pending": Increase your cluster resources or reduce the memory/CPU requests in your deployment.

Model returns 404 errors: Check your model name in the ENV variable matches the URL path in your requests.

Connection refused errors: Verify your service type is LoadBalancer and wait for the external IP to appear.