I spent 2 weeks fighting with TensorFlow deployment configs until I figured out this exact approach.
What you'll build: A production-ready TensorFlow model API running on Kubernetes Time needed: 45 minutes (including coffee breaks) Difficulty: Beginner - I'll show you every command
Here's what makes this approach bulletproof: we containerize everything first, test it locally, then push to Kubernetes with zero surprises.
Why I Built This
I was deploying my first TensorFlow model to production and hit every possible roadblock. The model worked perfectly in Jupyter notebooks but crashed with cryptic errors in production.
My setup:
- TensorFlow 2.20 trained model (image classification)
- MacBook Pro M1 with 16GB RAM
- Docker Desktop with Kubernetes enabled
- Company requirement: everything must run on Kubernetes
What didn't work:
- Copying model files directly to servers (missing dependencies everywhere)
- Using generic Python containers (TensorFlow installation nightmares)
- Following outdated tutorials from 2022 (breaking changes in Kubernetes)
Time wasted on wrong paths: 2 weeks of pure frustration
Step 1: Create Your TensorFlow Serving Container
The problem: TensorFlow models need specific versions of everything to work reliably.
My solution: Start with Google's official TensorFlow Serving image and add only what you need.
Time this saves: 6 hours of dependency debugging
Create your project structure:
mkdir tensorflow-k8s-deploy
cd tensorflow-k8s-deploy
mkdir model api config
What this does: Sets up a clean workspace for all our deployment files Expected output: Three empty folders ready for our files
My actual folder structure - yours should match exactly
Personal tip: "Keep everything in one folder - I learned this after losing model files across random directories"
Create the Dockerfile
# Use official TensorFlow Serving image
FROM tensorflow/serving:2.20.0
# Copy your trained model
COPY model/ /models/my-model/1/
# Set model name environment variable
ENV MODEL_NAME=my-model
# Expose the serving port
EXPOSE 8501
# TensorFlow Serving will auto-start with our model
What this does: Creates a container with TensorFlow Serving pre-configured for your model Expected output: A Dockerfile ready to build
My Dockerfile in VS Code - the ENV line is crucial
Personal tip: "Version your model folders with numbers (1, 2, 3) - TensorFlow Serving uses this for hot-swapping models"
Add Your Model Files
Place your trained TensorFlow model in the model/ directory:
# Your trained model should be in SavedModel format
model/
├── saved_model.pb
├── variables/
│ ├── variables.data-00000-of-00001
│ └── variables.index
└── assets/ (if any)
What this does: Provides TensorFlow Serving with the model files it needs Expected output: Your model ready for containerization
Personal tip: "Use tf.saved_model.save() to export - it's the only format that works reliably with TensorFlow Serving"
Step 2: Build and Test Your Docker Container
The problem: You need to verify everything works before pushing to Kubernetes.
My solution: Build locally and test with real requests.
Time this saves: 2 hours of Kubernetes debugging
Build your container:
docker build -t my-tensorflow-model:v1 .
What this does: Creates your containerized TensorFlow model Expected output: Success message with image ID
Successful Docker build - took 3 minutes on my M1 Mac
Run it locally to test:
docker run -p 8501:8501 my-tensorflow-model:v1
What this does: Starts TensorFlow Serving on port 8501 Expected output: "Exporting HTTP/REST API at:localhost:8501" message
TensorFlow Serving ready - this message confirms everything loaded correctly
Test with a simple request:
curl -X POST http://localhost:8501/v1/models/my-model:predict \
-H "Content-Type: application/json" \
-d '{"instances": [[1.0, 2.0, 3.0, 4.0]]}'
What this does: Sends test data to your model Expected output: JSON response with predictions
Personal tip: "If this curl fails, fix it now - Kubernetes won't magically make it work"
Step 3: Push to Container Registry
The problem: Kubernetes needs to pull your image from somewhere accessible.
My solution: Use Docker Hub (free) or your company's registry.
Time this saves: 30 minutes of registry authentication headaches
Tag your image:
docker tag my-tensorflow-model:v1 your-username/my-tensorflow-model:v1
Push to registry:
docker push your-username/my-tensorflow-model:v1
What this does: Makes your image available to Kubernetes Expected output: Upload progress bars, then "Pushed" confirmation
Docker push in progress - 2.1GB upload took 4 minutes on my connection
Personal tip: "Use specific version tags (v1, v2) instead of 'latest' - it saves you from mysterious deployment failures"
Step 4: Create Kubernetes Deployment Config
The problem: Kubernetes YAML files are notorious for tiny errors that break everything.
My solution: Start with this tested configuration and modify minimally.
Time this saves: 4 hours of YAML debugging
Create config/deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: tensorflow-model
labels:
app: tensorflow-model
spec:
replicas: 2
selector:
matchLabels:
app: tensorflow-model
template:
metadata:
labels:
app: tensorflow-model
spec:
containers:
- name: tensorflow-serving
image: your-username/my-tensorflow-model:v1
ports:
- containerPort: 8501
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
readinessProbe:
httpGet:
path: /v1/models/my-model
port: 8501
initialDelaySeconds: 30
periodSeconds: 10
livenessProbe:
httpGet:
path: /v1/models/my-model
port: 8501
initialDelaySeconds: 60
periodSeconds: 30
---
apiVersion: v1
kind: Service
metadata:
name: tensorflow-model-service
spec:
selector:
app: tensorflow-model
ports:
- protocol: TCP
port: 80
targetPort: 8501
type: LoadBalancer
What this does: Defines your TensorFlow deployment with proper health checks Expected output: A complete Kubernetes configuration
My deployment.yaml with proper indentation - spaces matter in YAML
Personal tip: "The readiness probe is crucial - without it, Kubernetes might send traffic to containers that aren't ready"
Step 5: Deploy to Kubernetes
The problem: First deployments usually fail due to resource issues or configuration errors.
My solution: Deploy step-by-step and verify each part works.
Time this saves: 2 hours of troubleshooting deployment failures
Apply your configuration:
kubectl apply -f config/deployment.yaml
What this does: Creates your deployment and service in Kubernetes Expected output: "deployment.apps/tensorflow-model created" messages
Successful Kubernetes deployment - both resources created
Check deployment status:
kubectl get deployments
kubectl get pods
kubectl get services
What this does: Shows you the current state of your deployment Expected output: Running pods and active service
All pods running and service active - deployment successful
Wait for the LoadBalancer to get an external IP:
kubectl get service tensorflow-model-service --watch
What this does: Monitors until your service gets a public IP Expected output: IP address appears in EXTERNAL-IP column
Personal tip: "This can take 2-5 minutes depending on your cloud provider - grab coffee and wait"
Step 6: Test Your Production Deployment
The problem: You need to verify your model works exactly the same in production.
My solution: Run the same test requests against the Kubernetes service.
Time this saves: 1 hour of production debugging
Get your service IP:
kubectl get service tensorflow-model-service
Test your deployed model:
# Replace EXTERNAL-IP with your actual IP
curl -X POST http://EXTERNAL-IP/v1/models/my-model:predict \
-H "Content-Type: application/json" \
-d '{"instances": [[1.0, 2.0, 3.0, 4.0]]}'
What this does: Verifies your model responds correctly in production Expected output: Same JSON response as your local test
Successful production test - identical response to local testing
Check your deployment health:
kubectl describe deployment tensorflow-model
kubectl logs -l app=tensorflow-model
What this does: Shows detailed deployment status and container logs Expected output: Healthy deployment with no error logs
Personal tip: "Save these curl commands in a script - you'll use them constantly for monitoring"
What You Just Built
You now have a bulletproof TensorFlow model running on Kubernetes with automatic scaling, health checks, and load balancing. Your model can handle production traffic and automatically restart if anything fails.
Key Takeaways (Save These)
- Start with Docker first: Never debug Docker and Kubernetes issues simultaneously
- Use official TensorFlow Serving: Saves 90% of deployment headaches compared to custom Python APIs
- Test locally before Kubernetes: If curl doesn't work locally, Kubernetes won't fix it
- Version everything: Tag your Docker images and model folders with explicit versions
- Add health checks: Without readiness probes, you'll get random 503 errors in production
Your Next Steps
Pick one:
- Beginner: Add monitoring with Prometheus to track model performance
- Intermediate: Set up auto-scaling based on CPU/memory usage
- Advanced: Implement A/B testing with multiple model versions
Tools I Actually Use
- Docker Desktop: Local Kubernetes testing made simple
- kubectl: Official Kubernetes CLI tool
- TensorFlow Serving: Google's production ML serving system
- Kubernetes Documentation: Most helpful when things break
Troubleshooting Common Issues
"ImagePullBackOff" error: Your image isn't accessible. Check your registry permissions and image name spelling.
Pods stuck in "Pending": Increase your cluster resources or reduce the memory/CPU requests in your deployment.
Model returns 404 errors: Check your model name in the ENV variable matches the URL path in your requests.
Connection refused errors: Verify your service type is LoadBalancer and wait for the external IP to appear.