Problem: Your Fine-Tuned Weights Are Worth Stealing
You spent weeks and thousands of dollars fine-tuning a model. Now it's deployed — and anyone with API access, a compromised cloud account, or physical server access can walk away with your weights.
You'll learn:
- How to encrypt weights at rest and in transit
- How to implement access controls that don't kill inference speed
- How to watermark weights so theft is provable in court
Time: 25 min | Level: Intermediate
Why This Happens
Fine-tuned weights are just files — .safetensors, .bin, or .pt — and they're typically stored on disk with the same permissions as any other file. Most teams focus on securing the API endpoint and forget the weights themselves.
Common attack vectors:
- Direct file system access via misconfigured cloud storage (S3, GCS, Azure Blob)
- Insider threat — employees or contractors with SSH access
- Model extraction attacks — querying the API repeatedly to reconstruct weights
- Compromised CI/CD pipelines that pull weights during deployment
Solution
Step 1: Encrypt Weights at Rest
Never store raw weights in cloud storage. Encrypt before upload, decrypt only in memory at inference time.
from cryptography.fernet import Fernet
import torch
import io
def encrypt_weights(model_path: str, key: bytes) -> bytes:
# Load weights into memory — never write decrypted to disk
buffer = io.BytesIO()
model = torch.load(model_path, map_location="cpu")
torch.save(model, buffer)
buffer.seek(0)
fernet = Fernet(key)
# Encrypt the raw bytes — key stays in your secrets manager
return fernet.encrypt(buffer.read())
def load_encrypted_weights(encrypted_path: str, key: bytes):
fernet = Fernet(key)
with open(encrypted_path, "rb") as f:
decrypted = fernet.decrypt(f.read())
buffer = io.BytesIO(decrypted)
# Load directly into memory, no temp file
return torch.load(buffer, map_location="cpu")
Generate and store your key securely:
from cryptography.fernet import Fernet
# Generate once — store in AWS Secrets Manager, Vault, or GCP Secret Manager
key = Fernet.generate_key()
print(key.decode()) # Save this to your secrets manager NOW
Expected: Your .safetensors or .pt file is replaced with an opaque encrypted blob. Raw weights are never on disk in production.
If it fails:
InvalidTokenerror: Key mismatch — confirm you're pulling the exact same key used during encryption- OOM during decrypt: Stream the decryption in chunks for models >10GB
Step 2: Lock Down Cloud Storage
Encrypted weights are useless if the bucket is public. Lock it down at the infrastructure level.
# AWS: Block all public access — do this first
aws s3api put-public-access-block \
--bucket your-model-bucket \
--public-access-block-configuration \
"BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"
# Then apply a least-privilege bucket policy
aws s3api put-bucket-policy \
--bucket your-model-bucket \
--policy file://bucket-policy.json
bucket-policy.json — only your inference service role can read:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "InferenceServiceOnly",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::YOUR_ACCOUNT:role/inference-service-role"
},
"Action": ["s3:GetObject"],
"Resource": "arn:aws:s3:::your-model-bucket/weights/*"
},
{
"Sid": "DenyEverythingElse",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::your-model-bucket",
"arn:aws:s3:::your-model-bucket/*"
],
"Condition": {
"StringNotEquals": {
"aws:PrincipalArn": "arn:aws:iam::YOUR_ACCOUNT:role/inference-service-role"
}
}
}
]
}
Expected: Only your inference service IAM role can read weights. Root account access still works (for break-glass scenarios), but no one else can list or download files.
Step 3: Watermark Your Weights
Encryption stops casual theft. Watermarking proves ownership if your weights end up somewhere they shouldn't.
The technique: embed a hidden signal in the model's parameter space that survives fine-tuning and quantization. The signal is detectable only with your private key.
import torch
import hashlib
import numpy as np
def embed_watermark(
model: torch.nn.Module,
owner_id: str,
secret_key: str,
strength: float = 0.001 # Low strength = invisible to performance benchmarks
) -> torch.nn.Module:
# Deterministically select layers to watermark based on secret key
rng = np.random.default_rng(
seed=int(hashlib.sha256(secret_key.encode()).hexdigest(), 16) % (2**32)
)
for name, param in model.named_parameters():
if "weight" in name and param.dim() >= 2:
# Generate owner-specific noise pattern — unique per model
noise = torch.tensor(
rng.normal(0, strength, param.shape),
dtype=param.dtype,
device=param.device
)
# Embed noise into weights — imperceptible but detectable
param.data += noise
return model
def verify_watermark(
model: torch.nn.Module,
owner_id: str,
secret_key: str,
strength: float = 0.001,
threshold: float = 0.7 # Correlation threshold for positive ID
) -> bool:
rng = np.random.default_rng(
seed=int(hashlib.sha256(secret_key.encode()).hexdigest(), 16) % (2**32)
)
correlations = []
for name, param in model.named_parameters():
if "weight" in name and param.dim() >= 2:
expected_noise = torch.tensor(
rng.normal(0, strength, param.shape),
dtype=param.dtype
)
actual = param.data.cpu().float()
# Pearson correlation between expected and actual noise
corr = torch.corrcoef(
torch.stack([actual.flatten(), expected_noise.flatten()])
)[0, 1].item()
correlations.append(corr)
# High correlation = your watermark is present
mean_corr = np.mean(correlations)
return mean_corr > threshold, mean_corr
# Usage
model = load_your_model()
watermarked = embed_watermark(model, owner_id="your-org", secret_key="your-secret-256bit-key")
# Later, on a suspected stolen model
is_yours, confidence = verify_watermark(
suspected_model,
owner_id="your-org",
secret_key="your-secret-256bit-key"
)
print(f"Watermark detected: {is_yours} (confidence: {confidence:.3f})")
Expected: Watermarking adds <0.1% to inference latency. Performance benchmarks (MMLU, HellaSwag) should not regress more than 0.2%.
If it fails:
- Low correlation on your own model: Increase
strengthto0.005— but benchmark for accuracy regression first - False positives: Lower
thresholdto0.85for stricter matching
Step 4: Rate-Limit and Monitor for Extraction Attacks
Model extraction works by querying your API thousands of times to reconstruct weights. Shut it down before it succeeds.
from fastapi import FastAPI, HTTPException, Request
from collections import defaultdict
import time
app = FastAPI()
# Track queries per IP — use Redis in production
query_tracker = defaultdict(list)
EXTRACTION_THRESHOLD = 500 # Queries per hour before flagging
WINDOW_SECONDS = 3600
def is_extraction_attempt(ip: str) -> bool:
now = time.time()
# Purge old entries outside the window
query_tracker[ip] = [t for t in query_tracker[ip] if now - t < WINDOW_SECONDS]
query_tracker[ip].append(now)
return len(query_tracker[ip]) > EXTRACTION_THRESHOLD
@app.post("/inference")
async def inference(request: Request, payload: dict):
ip = request.client.host
if is_extraction_attempt(ip):
# Log for security review, don't just silently drop
log_security_event("potential_extraction", ip=ip, count=len(query_tracker[ip]))
raise HTTPException(status_code=429, detail="Rate limit exceeded")
return run_inference(payload)
Expected: Legitimate users hit the endpoint freely. Automated extraction scripts trip the rate limiter within minutes.
Verification
Run this to confirm your protection stack is working:
# 1. Confirm weights are encrypted at rest
python -c "
with open('weights/model.enc', 'rb') as f:
data = f.read(16)
assert data[:5] != b'PK\x03\x04\x00', 'FAIL: Weights are not encrypted (zip/safetensors header detected)'
print('PASS: Weights encrypted')
"
# 2. Confirm bucket blocks public access
aws s3api get-public-access-block --bucket your-model-bucket
# 3. Verify watermark survives a save/load cycle
python verify_watermark.py --model weights/model.enc --key $SECRET_KEY
You should see:
PASS: Weights encrypted
BlockPublicAcls: true
IgnorePublicAcls: true
BlockPublicPolicy: true
RestrictPublicBuckets: true
Watermark detected: True (confidence: 0.847)
What You Learned
- Encrypting weights at rest prevents direct file theft — keys live in your secrets manager, never on disk
- Least-privilege cloud storage policies block the most common attack vector (misconfigured buckets)
- Statistical watermarking embeds a signal that survives quantization and further fine-tuning — useful if you need to prove ownership
- Rate limiting stops model extraction before an attacker accumulates enough queries to reconstruct meaningful weights
Limitations: Watermarking at strength=0.001 may not survive aggressive adversarial stripping attacks. For high-value models, use a dedicated watermarking library like MarkLLM which implements more robust schemes. Encryption protects the file — it does not prevent a determined attacker with legitimate API access from running extraction queries.
Don't use this if: Your threat model is nation-state level adversaries with physical access to your inference hardware. That requires TEEs (Trusted Execution Environments) — a separate topic.