Create production-quality test data for relational databases in minutes using Claude's API - no manual CSV writing required.
You'll learn:
- Generate complex relational data with foreign key constraints
- Build a reusable seeding script with Claude API
- Handle edge cases like circular dependencies and data consistency
Time: 20 min | Level: Intermediate
Problem: Test Data That Actually Makes Sense
You need realistic data for your SQL database - users, orders, products with proper relationships - but manual seeding is tedious and fake data libraries don't understand your schema.
Common symptoms:
- Orphaned foreign keys breaking queries
- Unrealistic data patterns (all orders on same date)
- Time wasted writing INSERT statements
- Can't test edge cases without custom data
Why AI-Powered Seeding Works
Traditional seeders use templates or random generation, but they don't understand relational constraints. Claude can read your schema and generate contextually correct data that respects foreign keys, business logic, and realistic patterns.
What makes this different:
- Generates data that matches your domain (e-commerce, SaaS, healthcare)
- Maintains referential integrity automatically
- Creates edge cases on demand (high-value customers, canceled orders)
- Adapts to schema changes without rewriting code
Solution
Step 1: Set Up the Seeding Script
Create a Node.js script that reads your schema and calls Claude:
// seed-db.js
import Anthropic from '@anthropic-ai/sdk';
import { createPool } from '@vercel/postgres';
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY
});
const db = createPool({
connectionString: process.env.DATABASE_URL
});
async function generateSeedData(schema, rowCounts) {
const response = await anthropic.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 4000,
messages: [{
role: 'user',
content: `Generate SQL INSERT statements for this schema:
${schema}
Requirements:
- ${rowCounts.users} users
- ${rowCounts.products} products
- ${rowCounts.orders} orders (realistic distribution across users)
- Maintain all foreign key constraints
- Use realistic data (names, emails, prices)
- Return ONLY valid SQL INSERT statements, no explanations`
}]
});
return response.content[0].text;
}
Why this works: Claude sees the entire schema context, so it generates foreign keys that reference actual IDs it created earlier.
Step 2: Define Your Schema
const schema = `
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
name VARCHAR(100) NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(200) NOT NULL,
price DECIMAL(10,2) NOT NULL,
stock INT DEFAULT 0
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INT REFERENCES users(id),
total DECIMAL(10,2) NOT NULL,
status VARCHAR(20) DEFAULT 'pending',
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE order_items (
id SERIAL PRIMARY KEY,
order_id INT REFERENCES orders(id),
product_id INT REFERENCES products(id),
quantity INT NOT NULL,
price DECIMAL(10,2) NOT NULL
);
`;
const rowCounts = {
users: 50,
products: 20,
orders: 100
};
Expected: Schema includes foreign keys - Claude will respect these when generating IDs.
Step 3: Execute the Seed
async function seed() {
try {
// Generate SQL from Claude
console.log('Generating seed data...');
const sql = await generateSeedData(schema, rowCounts);
// Execute in transaction
await db.query('BEGIN');
await db.query(sql);
await db.query('COMMIT');
console.log('✓ Database seeded successfully');
} catch (error) {
await db.query('ROLLBACK');
console.error('Seed failed:', error.message);
} finally {
await db.end();
}
}
seed();
Run it:
node seed-db.js
If it fails:
- Error: "duplicate key value": Add
TRUNCATE TABLE users, orders, products CASCADE;before seeding - Error: "foreign key violation": Check table order - Claude might generate orders before users
Step 4: Handle Table Dependencies
Fix ordering issues by explicitly sequencing tables:
async function generateSeedData(schema, rowCounts) {
const response = await anthropic.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 4000,
messages: [{
role: 'user',
content: `Generate SQL INSERT statements in this exact order:
1. users table (${rowCounts.users} rows)
2. products table (${rowCounts.products} rows)
3. orders table (${rowCounts.orders} rows, reference user IDs 1-${rowCounts.users})
4. order_items table (2-5 items per order, reference existing order and product IDs)
Schema:
${schema}
Rules:
- Start IDs at 1 for each table
- orders.user_id must be between 1 and ${rowCounts.users}
- orders.total should match sum of order_items.price * quantity
- Return raw SQL only, no markdown`
}]
});
return response.content[0].text;
}
Why this matters: Explicit ordering prevents FK violations. Claude follows instructions to reference valid IDs.
Step 5: Add Edge Cases
Generate specific test scenarios:
const edgeCasePrompt = `
Also include:
- 3 users with no orders (test null relationships)
- 1 high-value customer with 10+ orders
- 2 products with 0 stock (test inventory logic)
- 5 cancelled orders (status = 'cancelled')
`;
// Append to existing prompt
content: `${basePrompt}\n\n${edgeCasePrompt}`
Expected: SQL includes outliers your tests need, not just average-case data.
Verification
# Check row counts
psql $DATABASE_URL -c "SELECT
(SELECT COUNT(*) FROM users) as users,
(SELECT COUNT(*) FROM orders) as orders,
(SELECT COUNT(*) FROM products) as products;"
You should see: Exact counts matching your rowCounts config.
Validate relationships:
-- Check for orphaned orders
SELECT COUNT(*) FROM orders o
LEFT JOIN users u ON o.user_id = u.id
WHERE u.id IS NULL;
Expected: 0 rows (all foreign keys valid).
Advanced: Reusable Seeder Class
class AISeedGenerator {
constructor(apiKey, dbUrl) {
this.anthropic = new Anthropic({ apiKey });
this.db = createPool({ connectionString: dbUrl });
}
async seed(schema, config) {
const sql = await this.generateSQL(schema, config);
await this.execute(sql);
}
async generateSQL(schema, config) {
// Add retry logic for rate limits
let attempts = 0;
while (attempts < 3) {
try {
const response = await this.anthropic.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 4000,
messages: [{
role: 'user',
content: this.buildPrompt(schema, config)
}]
});
return this.cleanSQL(response.content[0].text);
} catch (error) {
if (error.status === 429) {
await this.sleep(2000 * (attempts + 1));
attempts++;
} else throw error;
}
}
}
cleanSQL(raw) {
// Remove markdown code blocks if present
return raw
.replace(/```sql\n?/g, '')
.replace(/```\n?/g, '')
.trim();
}
async execute(sql) {
await this.db.query('BEGIN');
try {
// Split on semicolons, execute sequentially
const statements = sql.split(';').filter(s => s.trim());
for (const stmt of statements) {
await this.db.query(stmt);
}
await this.db.query('COMMIT');
} catch (error) {
await this.db.query('ROLLBACK');
throw error;
}
}
buildPrompt(schema, config) {
return `Generate SQL INSERTs for this schema with ${config.rows} realistic records per table:
${schema}
Requirements:
${config.requirements || '- Realistic data\n- Valid foreign keys'}
Return raw SQL only.`;
}
sleep(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
}
// Usage
const seeder = new AISeedGenerator(
process.env.ANTHROPIC_API_KEY,
process.env.DATABASE_URL
);
await seeder.seed(schema, {
rows: 100,
requirements: '- Include edge cases\n- Realistic timestamps'
});
What You Learned
- AI understands relational schemas and generates valid foreign keys
- Explicit table ordering prevents constraint violations
- Edge cases can be prompted, not manually coded
- Claude API handles complex data generation better than templates
Limitations:
- Large schemas (50+ tables) may need chunking
- Costs ~$0.01-0.05 per seed run depending on data volume
- Generated data isn't deterministic (varies per run)
When NOT to use this:
- Production data migrations (use proper ETL tools)
- Compliance-sensitive data (use anonymization tools)
- Performance testing needing millions of rows (use bulk generators)
Tested with Claude Sonnet 4, PostgreSQL 16.x, Node.js 22.x