Manage API Keys Securely in Serverless AI Architectures

Problem: Your API Keys Are One Leak Away From a $50K Bill

You're calling OpenAI, Anthropic, or Gemini from a serverless function. The keys are in environment variables — or worse, hardcoded. One exposed repo, one misconfigured log, and those keys are scraped and abused within minutes.

You'll learn:

Where API keys actually leak in serverless AI stacks
How to use secrets managers (AWS, GCP, Azure) without adding latency
How to rotate and scope keys so a breach stays contained

Time: 20 min | Level: Intermediate

Why This Happens

Serverless functions run ephemerally, so developers reach for environment variables as the path of least resistance. The problem is that environment variables in Lambda, Cloud Functions, and similar runtimes are stored in plaintext in the platform config — visible to anyone with IAM access to the function, logged by accident, or exposed via framework misconfigs.

AI API keys are high-value targets because they carry billing authority. Unlike a leaked DB password (bad), a leaked OpenAI key can rack up thousands in charges before you notice.

Common symptoms:

Keys visible in CI/CD logs (printenv during debugging)
Keys committed to .env files that sneak into git history
Lambda environment vars accessible to over-permissioned IAM roles
Keys shared across environments (dev key = prod key)

Solution

Step 1: Move Keys Out of Environment Variables and Into a Secrets Manager

Never store API keys directly in your function's environment config. Instead, fetch them at cold start from a secrets manager and cache in memory.

// secrets.ts — fetch once, cache for the function lifetime
import { SecretsManagerClient, GetSecretValueCommand } from "@aws-sdk/client-secrets-manager";

const client = new SecretsManagerClient({ region: "us-east-1" });

// Module-level cache: survives warm invocations, gone on cold start
let cachedSecrets: Record<string, string> | null = null;

export async function getSecrets(): Promise<Record<string, string>> {
  if (cachedSecrets) return cachedSecrets;

  const response = await client.send(
    new GetSecretValueCommand({ SecretId: "prod/ai-api-keys" })
  );

  // Parse once and cache — avoids repeated network calls
  cachedSecrets = JSON.parse(response.SecretString!);
  return cachedSecrets;
}

// handler.ts — your Lambda function
import { getSecrets } from "./secrets";
import Anthropic from "@anthropic-ai/sdk";

export const handler = async (event: unknown) => {
  const secrets = await getSecrets();

  // Key never touches an env var; lives only in memory
  const anthropic = new Anthropic({ apiKey: secrets.ANTHROPIC_API_KEY });

  const message = await anthropic.messages.create({
    model: "claude-sonnet-4-6",
    max_tokens: 1024,
    messages: [{ role: "user", content: "Hello" }],
  });

  return { statusCode: 200, body: JSON.stringify(message) };
};

Expected: First invocation adds ~30-80ms latency (secrets fetch). Warm invocations: zero overhead.

If it fails:

AccessDeniedException: Your Lambda execution role is missing secretsmanager:GetSecretValue. Add it.
ResourceNotFoundException: Check the secret name matches exactly, including the / prefix.

Step 2: Scope Your IAM Role to Only the Secrets It Needs

The Lambda execution role should only be able to read the specific secrets it uses — not all secrets in the account.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "secretsmanager:GetSecretValue",
      "Resource": "arn:aws:secretsmanager:us-east-1:123456789:secret:prod/ai-api-keys-*"
    }
  ]
}

The trailing -* accounts for the random suffix AWS appends to secret ARNs. Without it, the policy silently fails even when the name matches.

Do this for every function separately. A function that only calls Anthropic should not have access to your Stripe or database secrets.

Step 3: Create Scoped, Rotatable API Keys Per Environment

Most AI providers let you create multiple API keys. Use this. Create separate keys for dev, staging, and prod — and if your provider supports it, scope them to specific models or rate limits.

# Store secrets separately per environment
aws secretsmanager create-secret \
  --name "prod/ai-api-keys" \
  --secret-string '{"ANTHROPIC_API_KEY":"sk-ant-prod-xxxx"}'

aws secretsmanager create-secret \
  --name "staging/ai-api-keys" \
  --secret-string '{"ANTHROPIC_API_KEY":"sk-ant-staging-xxxx"}'

Then in your deployment config, pass only the environment name — never the key value itself:

// serverless.yml (Serverless Framework)
functions:
  aiHandler:
    handler: handler.handler
    environment:
      # This is NOT a secret — it's just a path
      SECRET_NAME: "prod/ai-api-keys"

// Update secrets.ts to use the env var for the path only
const secretName = process.env.SECRET_NAME!; // "prod/ai-api-keys"

const response = await client.send(
  new GetSecretValueCommand({ SecretId: secretName })
);

If it fails:

Wrong environment keys used in prod: Double-check SECRET_NAME in your deployment pipeline. Promote the config, not the keys.

Step 4: Enable Automatic Rotation

Set a rotation schedule so a leaked key has a limited blast radius. AWS Secrets Manager supports rotation with a Lambda rotator.

# Rotate every 30 days
aws secretsmanager rotate-secret \
  --secret-id "prod/ai-api-keys" \
  --rotation-rules AutomaticallyAfterDays=30

For AI API keys specifically, the rotation logic is: create a new key at the provider, update the secret, delete the old key. This is manual for most AI providers today, but you can script it:

// rotate.ts — called by AWS rotation Lambda
export async function rotateKey() {
  // 1. Create new key via provider API
  const newKey = await createNewAnthropicKey();

  // 2. Update secret with new value
  await client.send(new PutSecretValueCommand({
    SecretId: "prod/ai-api-keys",
    SecretString: JSON.stringify({ ANTHROPIC_API_KEY: newKey }),
  }));

  // 3. Verify the new key works before deleting the old one
  await verifyKey(newKey);

  // 4. Delete old key from provider
  await deleteOldAnthropicKey();
}

Verification

Deploy your function and confirm the key never appears in plaintext:

# Check that no keys are in environment config
aws lambda get-function-configuration \
  --function-name your-ai-function \
  --query 'Environment.Variables'

# Should return your SECRET_NAME path, not the actual key value

# Verify the function can fetch and use the secret
aws lambda invoke \
  --function-name your-ai-function \
  --payload '{}' \
  response.json && cat response.json

You should see: A successful AI response with no key visible anywhere in the config or logs.

Also run a quick audit on your git history — this catches the most common source of leaks:

# Scan for accidentally committed secrets
npx git-secrets --scan-history
# Or use truffleHog for a deeper scan
npx trufflehog git file://. --since-commit HEAD~50

What You Learned

Secrets managers add minimal latency when you cache at the module level — the cold start cost is worth the security gain
Scoped IAM policies mean a compromised function can't pivot to other secrets
Separate keys per environment means a dev leak can't drain your prod budget
Rotation limits blast radius — a key that rotates every 30 days is only useful to an attacker for up to 30 days

Limitations: This pattern covers AWS. GCP Secret Manager and Azure Key Vault follow the same model — fetch at init, cache in memory, scope access to the specific function identity. The SDK calls differ but the architecture is identical.

When NOT to use this: If you're prototyping locally, env vars in a .env file that's in .gitignore are fine. The overhead of secrets manager setup isn't worth it for throwaway scripts. Draw the line at anything that touches production traffic or a shared API key.

Tested on AWS Lambda (Node.js 22.x), AWS Secrets Manager, Anthropic SDK 0.36+