Deploy AI Models to iOS with Core ML in 20 Minutes

Convert PyTorch or TensorFlow models to Core ML format and run them on-device in your iOS app with no internet required.

Problem: Your AI Model Won't Run On-Device

You've trained a model in Python and now need it running in an iOS app — fast, offline, with no cloud calls. Core ML is Apple's answer, but the conversion pipeline trips up most developers the first time.

You'll learn:

  • How to convert PyTorch or TensorFlow models to .mlpackage format
  • How to add and run the model in a Swift iOS project
  • How to debug common shape and type mismatch errors

Time: 20 min | Level: Intermediate


Why This Happens

Core ML requires models in Apple's proprietary format (.mlmodel or .mlpackage). Your PyTorch .pt or TensorFlow SavedModel can't be loaded directly — you need coremltools to bridge the gap.

The conversion isn't always smooth. Models trained with dynamic shapes, custom ops, or unsupported layers fail silently or produce wrong outputs if you skip the tracing and validation steps.

Common symptoms:

  • coremltools raises ValueError: Unsupported op during conversion
  • Model loads in Xcode but predictions crash at runtime
  • Output tensor shape doesn't match what your Swift code expects

Solution

Step 1: Install coremltools and Export Your Model

pip install coremltools==8.0 torch torchvision

Trace your PyTorch model before converting. Core ML needs a static graph, not a dynamic one.

import torch
import coremltools as ct

# Load your trained model
model = YourModel()
model.load_state_dict(torch.load("model.pth"))
model.eval()

# Trace with a representative input — shape must match real inputs exactly
example_input = torch.rand(1, 3, 224, 224)  # batch=1, RGB, 224x224
traced = torch.jit.trace(model, example_input)

# Convert to Core ML
mlmodel = ct.convert(
    traced,
    inputs=[ct.TensorType(name="input", shape=example_input.shape)],
    minimum_deployment_target=ct.target.iOS17,
    compute_precision=ct.precision.FLOAT16,  # Smaller, faster on Apple Silicon
)

mlmodel.save("YourModel.mlpackage")
print("Saved.")

Expected: A YourModel.mlpackage folder appears in your working directory.

If it fails:

  • Unsupported op: aten::...: The op isn't in Core ML's supported set. Wrap it in a custom layer or simplify the layer in PyTorch before tracing.
  • TracerWarning: Converting a tensor to Python...: Your model has control flow (if/else) that can't be traced. Use torch.jit.script instead of trace.

Step 2: Validate the Converted Model

Don't skip this. Run a prediction in Python before touching Xcode.

import numpy as np
import coremltools as ct

mlmodel = ct.models.MLModel("YourModel.mlpackage")

# Use the same shape as your trace input
test_input = {"input": np.random.rand(1, 3, 224, 224).astype(np.float32)}
output = mlmodel.predict(test_input)

print(output)  # Should show dict with your output key and expected shape

Expected: A dict like {"output": array([[0.1, 0.7, 0.2]])} with the right class count.

If the values look wrong: Check that you're applying the same normalization in Python as your original training pipeline. Forgetting to normalize is the #1 source of garbage outputs.

Terminal showing coremltools prediction output Output dict with correct shape confirms the conversion worked before you open Xcode


Step 3: Add the Model to Your Xcode Project

Drag YourModel.mlpackage into your Xcode project. Make sure "Add to target" is checked.

Xcode auto-generates a Swift class (YourModel) with typed input/output structs. You don't write this — it's generated for you.

import CoreML
import Vision

// Load model (do this once, not per-frame)
guard let model = try? YourModel(configuration: MLModelConfiguration()) else {
    fatalError("Failed to load Core ML model")
}

func runInference(on pixelBuffer: CVPixelBuffer) -> [Float]? {
    // YourModelInput is auto-generated by Xcode
    guard let input = try? YourModelInput(inputWith: pixelBuffer) else { return nil }
    
    guard let output = try? model.prediction(input: input) else { return nil }
    
    // Access the output by the name you set in coremltools (e.g., "output")
    return output.output.map { Float(truncating: $0) }
}

Why load once: MLModelConfiguration setup and model compilation happen at init time. Loading per-frame adds 100–400ms of latency you don't want.


Step 4: Convert Camera Frames to the Right Format

Most models expect CVPixelBuffer, not UIImage. Use Vision to handle the resize and format conversion automatically.

import Vision

func classifyImage(_ image: UIImage) {
    guard let cgImage = image.cgImage else { return }
    
    // VNCoreMLRequest handles resize + pixel format automatically
    guard let coreMLModel = try? VNCoreMLModel(for: YourModel().model) else { return }
    
    let request = VNCoreMLRequest(model: coreMLModel) { request, error in
        guard let results = request.results as? [VNClassificationObservation] else { return }
        
        // Results are sorted by confidence, highest first
        if let top = results.first {
            print("\(top.identifier): \(top.confidence)")
        }
    }
    
    request.imageCropAndScaleOption = .centerCrop  // Match your training crop strategy
    
    let handler = VNImageRequestHandler(cgImage: cgImage, options: [:])
    try? handler.perform([request])
}

If it fails:

  • Confidence is always ~0 or ~1: Your model outputs raw logits, not probabilities. Add a softmax layer before saving in Python: torch.nn.Softmax(dim=1).
  • Wrong class predicted: Confirm your label order matches training. Core ML sorts VNClassificationObservation by confidence, not by class index.

Verification

Build and run on a physical device (Core ML Neural Engine only activates on hardware, not Simulator).

// Add this to measure inference time
let start = Date()
try? handler.perform([request])
let elapsed = Date().timeIntervalSince(start) * 1000
print("Inference: \(Int(elapsed))ms")

You should see: Inference under 10ms for MobileNet-sized models on iPhone 14+. If you're seeing 200ms+, check that compute_precision is FLOAT16 in your conversion step and that MLModelConfiguration has no overrides forcing CPU.

Xcode console showing inference time output Sub-10ms inference confirms the Neural Engine is active and the model loaded correctly


What You Learned

  • PyTorch models must be traced (or scripted) before Core ML conversion — dynamic graphs don't convert
  • Validate in Python with mlmodel.predict() before opening Xcode to catch shape and normalization issues early
  • Use VNCoreMLRequest instead of calling the model directly — it handles pixel format conversion for you
  • Load the model once at app startup, not per-inference call

Limitations: Core ML doesn't support every PyTorch op. If your model uses custom CUDA kernels or unusual attention variants, you'll need to refactor or find an equivalent supported op. Check Apple's op compatibility list before starting conversion on complex architectures.

When NOT to use this: If your model is over 500MB or requires server-side state (RAG, tool-calling agents), on-device isn't practical. Use a local Swift HTTP client hitting a local or cloud inference endpoint instead.


Tested on coremltools 8.0, PyTorch 2.5, Xcode 16.2, iOS 17+, iPhone 15 Pro