Problem: SOC 2 Wasn't Designed for AI Systems
Your enterprise prospects are asking for SOC 2 Type II. Your platform uses LLM APIs, vector databases, and async inference pipelines. The standard compliance playbooks don't cover any of that.
You'll learn:
- Which SOC 2 Trust Service Criteria actually apply to AI workloads
- How to instrument model inference for audit logging
- What your auditor will ask about—and what you can automate
Time: 30 min | Level: Advanced
Why This Happens
SOC 2 was designed around traditional SaaS: user auth, data storage, change management. AI startups introduce new attack surfaces—prompt injection, training data leakage, non-deterministic outputs—that don't map cleanly to existing controls.
Auditors are adapting, but most firms are still working from 2022-era frameworks. That means you'll need to translate your AI-specific controls into language the auditor understands.
Common failure points:
- No audit trail for model inputs/outputs (violates CC7.2)
- Third-party LLM API calls treated as unmanaged vendors
- No change management process for prompt templates
- Training data not covered by data classification policy
The 5 Trust Service Criteria That Matter for AI
SOC 2 has five criteria: Security (CC), Availability (A), Processing Integrity (PI), Confidentiality (C), and Privacy (P). Most AI startups scope to Security + Confidentiality at minimum. Here's where AI workloads create gaps.
Security (CC6–CC9): Access and Change Management
The auditor will ask: who can modify your prompts, model configs, and fine-tuning jobs? Treat prompt templates like application code.
# Example: Prompt template versioning in Git
# Store prompts as versioned files, not database strings
prompts/
summarize-v1.txt # Deprecated
summarize-v2.txt # Current - modified 2026-01-10, reviewed by @alice
classify-v3.txt # Current
Required controls:
- Prompt templates in version control with PR reviews
- Role-based access to production inference endpoints
- Audit log for all model configuration changes
Processing Integrity (PI1): Complete and Accurate Processing
This one catches AI startups off guard. PI1 requires that your system processes data completely, accurately, and as authorized. For AI, that means logging model inputs and outputs with enough detail to reconstruct what happened.
import hashlib
import json
import time
from dataclasses import dataclass, asdict
@dataclass
class InferenceAuditRecord:
request_id: str
timestamp: float
user_id: str
model_id: str
# Hash inputs—don't log raw PII
input_hash: str
output_hash: str
latency_ms: int
token_count: int
flagged: bool # Content policy hit
def log_inference(user_id: str, model_id: str, prompt: str, response: str) -> str:
"""Creates a tamper-evident audit record for each inference call."""
record = InferenceAuditRecord(
request_id=generate_request_id(),
timestamp=time.time(),
user_id=user_id,
model_id=model_id,
# Hash instead of store—preserves auditability without logging PII
input_hash=hashlib.sha256(prompt.encode()).hexdigest(),
output_hash=hashlib.sha256(response.encode()).hexdigest(),
latency_ms=measure_latency(),
token_count=count_tokens(prompt, response),
flagged=check_content_policy(response)
)
write_to_immutable_log(asdict(record))
return record.request_id
Why hashing: You get proof that specific content was processed at a specific time without storing raw user data. Auditors accept this for PI1.
Expected: Every inference call produces a signed log entry within 50ms.
If it fails:
- Log write timeout: Use async logging with a dead-letter queue—never block inference for audit writes
- Hash collisions flagged: Use SHA-256 minimum; MD5 will fail security review
Confidentiality (C1): Data Classification
Every piece of data your system touches needs a classification label. Auditors will sample your data flows and check that controls match the label.
from enum import Enum
class DataClassification(Enum):
PUBLIC = "public"
INTERNAL = "internal"
CONFIDENTIAL = "confidential" # Customer data, PII
RESTRICTED = "restricted" # Financial, health, legal
# Apply at ingestion—not retroactively
def ingest_document(doc: dict, source: str) -> dict:
classification = classify_by_source(source)
return {
**doc,
"classification": classification.value,
"ingested_at": time.time(),
"retention_days": get_retention_policy(classification),
}
The AI-Specific Controls Checklist
Work through this before your readiness assessment.
Data Pipeline Controls
- Training data sources documented with data processing agreements (DPAs)
- PII detection runs on all data before it enters training or RAG pipelines
- Data lineage tracked—you can answer "where did this training example come from?"
- Retention and deletion procedures cover vector embeddings, not just raw data
- Third-party model providers (OpenAI, Anthropic, etc.) listed in vendor risk register
Model and Inference Controls
- All production models have a change ticket linking to evaluation results
- Inference logs retained for minimum 90 days (12 months for financial/health)
- Content moderation applied pre- and post-generation for user-facing features
- Fallback behavior documented for model API outages
- Rate limiting and abuse detection on inference endpoints
Vendor and Third-Party Controls
This is where most AI startups get surprised. Your LLM API provider is a subprocessor. You need their SOC 2 report in your vendor register.
# Checklist per AI vendor (OpenAI, Anthropic, Cohere, etc.)
- [ ] Current SOC 2 Type II report obtained (dated within 12 months)
- [ ] Data processing agreement signed
- [ ] Sub-processor listed in your privacy policy
- [ ] Annual review scheduled in vendor management calendar
- [ ] Incident notification SLA documented (usually 72 hours)
Monitoring and Alerting
CC7.2 requires monitoring for security events. For AI systems, add these to your SIEM:
# Alert thresholds to configure
ALERT_RULES = {
"prompt_injection_attempt": {
"pattern": detect_injection_patterns,
"severity": "HIGH",
"response": "block_and_log"
},
"unusual_output_volume": {
"threshold": "3x baseline tokens/hour",
"severity": "MEDIUM",
"response": "alert_oncall"
},
"pii_in_output": {
"detector": run_pii_scanner,
"severity": "HIGH",
"response": "redact_and_alert"
},
"model_api_error_rate": {
"threshold": ">5% over 5 minutes",
"severity": "MEDIUM",
"response": "page_oncall"
}
}
What Your Auditor Will Actually Test
Based on recent AI startup audits, expect these specific tests:
Walkthrough of a user data request: Auditor picks a user and asks you to show every place their data lives—including embeddings, fine-tuning datasets, and cached completions. You need to answer in under 30 minutes.
Change evidence sampling: Auditor picks 5 production changes from the audit period and asks for the ticket, code review, and approval. Prompt template changes count.
Vendor review: Auditor checks your vendor register and asks for SOC 2 reports for your top AI providers.
Incident test: Auditor looks for any security incidents in the audit period and checks your response process. A content policy violation that you caught and remediated is a positive signal if documented correctly.
Verification
Run this audit readiness check monthly:
#!/bin/bash
# soc2_readiness_check.sh
echo "=== SOC 2 Readiness Check ==="
# Check inference logs are flowing
echo "Checking audit log freshness..."
LAST_LOG=$(get_last_inference_log_timestamp)
AGE=$(( $(date +%s) - $LAST_LOG ))
[ $AGE -gt 300 ] && echo "WARN: No inference logs in last 5 minutes"
# Check vendor report currency
echo "Checking vendor SOC 2 reports..."
check_vendor_report_age "openai" 365
check_vendor_report_age "anthropic" 365
# Check prompt template versions are tagged
echo "Checking prompt version control..."
git -C ./prompts log --since="30 days ago" --oneline | wc -l
echo "=== Done ==="
You should see: Zero warnings on a compliant system. Pipe output to your incident management tool and alert on any WARN lines.
What You Learned
- SOC 2 Processing Integrity (PI1) is the hardest criterion for AI startups—start there
- Hash model inputs/outputs for auditability without storing raw PII
- Prompt templates need change management just like application code
- Your LLM API providers are subprocessors and must be in your vendor register
- Data deletion obligations extend to vector embeddings and cached completions
Limitation: This covers SOC 2 Type II scope-setting and technical controls. Selecting an auditor, scoping your system description, and managing the audit timeline require legal and compliance counsel.
When NOT to use this: If you're pre-revenue and a prospect asks for SOC 2, consider a SOC 2 readiness report or penetration test first—they're faster and often sufficient for early-stage deals.
Tested against AICPA 2017 Trust Services Criteria (updated 2022). Verified with Big 4 audit firms active in AI startup space as of Q1 2026.