Problem: Your Node.js App Crashes After 6 Hours
Your API runs fine during testing but crashes in production with FATAL ERROR: Reached heap limit. Memory usage climbs from 200MB to 2GB over hours, then the process dies.
You'll learn:
- How to capture heap snapshots without downtime
- Use AI tools to identify leak patterns automatically
- Fix the 3 most common Node.js memory leak causes
Time: 20 min | Level: Intermediate
Why This Happens
Node.js doesn't garbage collect objects that still have references, even if you'll never use them again. Common culprits: event listeners that never detach, closures holding stale data, and global caches that grow forever.
Common symptoms:
- Memory usage grows 10-50MB per hour under load
process.memoryUsage().heapUsednever decreases- App slows down before crashing with heap errors
- Happens in production, not local development
Solution
Step 1: Install AI-Powered Profiling Tools
# Install modern profiling stack
npm install --save-dev @clinic/doctor @clinic/heapprofiler
npm install -g clinic
npm install --save-dev memlab # Meta's AI leak detector
Why these tools:
clinicprovides visual flamegraphs with AI recommendationsmemlabuses ML to detect leak patterns automatically- Both work in production without significant overhead
Step 2: Capture Baseline Heap Snapshot
// Add to your app startup (app.js or server.js)
const v8 = require('v8');
const fs = require('fs');
// Take snapshot endpoint (protect in production!)
app.get('/heap-snapshot', (req, res) => {
const filename = `heap-${Date.now()}.heapsnapshot`;
const snapshot = v8.writeHeapSnapshot(filename);
res.json({ snapshot, size: fs.statSync(filename).size });
});
// Auto-snapshot when memory threshold crossed
setInterval(() => {
const used = process.memoryUsage().heapUsed / 1024 / 1024;
if (used > 1500) { // 1.5GB threshold
console.warn(`High memory: ${used}MB`);
v8.writeHeapSnapshot(`auto-${Date.now()}.heapsnapshot`);
}
}, 60000); // Check every minute
Expected: .heapsnapshot files in your project root when memory spikes
If it fails:
- Error: "Permission denied": Ensure app has write access to current directory
- File too large: Memory already critical, restart and capture earlier
Step 3: Run AI Analysis with Memlab
# Create test scenario file
cat > leak-scenario.js << 'EOF'
module.exports = {
url: () => 'http://localhost:3000',
// AI watches for objects growing between iterations
action: async (page) => {
await page.click('[data-load-data]'); // Your heavy operation
await page.waitForTimeout(2000);
},
// Memlab AI detects what should be GC'd but isn't
back: async (page) => {
await page.click('[data-clear]');
}
};
EOF
# Run AI-powered leak detection
memlab run --scenario leak-scenario.js --work-dir ./memlab-results
# Get AI analysis report
memlab analyze unbound-object --work-dir ./memlab-results
What the AI finds:
- Detached DOM nodes still in memory
- Event listeners never removed
- Closures capturing large objects
- Cached data without size limits
Expected output:
🔍 AI detected 3 potential leaks:
1. EventEmitter leak (confidence: 94%)
└─ 847 listeners on 'data' event
└─ Source: /src/services/websocket.js:23
└─ Fix: Call emitter.removeListener() in cleanup
2. Closure leak (confidence: 87%)
└─ 1,240 objects retained by timer callback
└─ Source: /src/cache/refresh.js:15
└─ Fix: Clear interval and dereference cache
3. Global array growth (confidence: 99%)
└─ requestLog[] grew to 45,000 items
└─ Source: /src/middleware/logger.js:8
└─ Fix: Implement circular buffer with max size
Step 4: Fix Common Leak Patterns
Pattern 1: Event Listener Leak
// ❌ Before: Listener never removed
class DataFetcher {
constructor(emitter) {
this.emitter = emitter;
emitter.on('data', (data) => this.process(data));
}
}
// ✅ After: Proper cleanup
class DataFetcher {
constructor(emitter) {
this.emitter = emitter;
this.handler = (data) => this.process(data);
emitter.on('data', this.handler);
}
destroy() {
this.emitter.removeListener('data', this.handler);
this.handler = null; // Break reference
}
}
Why this works: Each instance no longer leaves behind a listener when destroyed. Without cleanup, 1000 requests = 1000 orphaned listeners.
Pattern 2: Timer/Interval Leak
// ❌ Before: Timer keeps references alive
function scheduleRefresh(cache) {
setInterval(() => {
cache.refresh(); // Cache object never gets GC'd
}, 60000);
}
// ✅ After: Clearable timer with weak reference
function scheduleRefresh(cache) {
const timerId = setInterval(() => {
if (!cache) {
clearInterval(timerId);
return;
}
cache.refresh();
}, 60000);
cache.stopTimer = () => clearInterval(timerId);
return timerId;
}
Why this works: Timer can be stopped, and we check if cache still exists. Original version held cache in memory forever.
Pattern 3: Unbounded Cache Growth
// ❌ Before: Cache grows forever
const requestCache = new Map();
app.get('/api/data/:id', (req, res) => {
requestCache.set(req.params.id, { data: fetchData() });
res.json(requestCache.get(req.params.id));
});
// ✅ After: LRU cache with size limit
const LRU = require('lru-cache');
const requestCache = new LRU({
max: 500, // Max 500 items
maxSize: 50 * 1024 * 1024, // 50MB limit
sizeCalculation: (value) => JSON.stringify(value).length,
ttl: 1000 * 60 * 5, // 5 minute TTL
});
app.get('/api/data/:id', (req, res) => {
let cached = requestCache.get(req.params.id);
if (!cached) {
cached = { data: fetchData() };
requestCache.set(req.params.id, cached);
}
res.json(cached);
});
Why this works: LRU automatically evicts oldest items when limits hit. Original Map grew unbounded - 100k requests = 100k cached items.
Step 5: Verify with Clinic Doctor
# Profile your fixed app under load
clinic doctor -- node app.js &
# Generate realistic traffic (adjust for your app)
npx autocannon -c 100 -d 60 http://localhost:3000/api/heavy-endpoint
# Kill app, AI analyzes results
kill %1
# Open AI report
clinic doctor --open
You should see:
- Flat memory usage over time (not climbing)
- Heap size stabilizes around baseline
- AI reports "No anomalies detected"
AI Doctor flags:
- 🟢 Green: Memory stable, GC working properly
- 🟡 Yellow: Potential issue, investigate further
- 🔴 Red: Active leak detected, review code
Verification
Real-Time Memory Monitoring
// Add to production app for ongoing monitoring
const memwatch = require('@airbnb/node-memwatch');
memwatch.on('leak', (info) => {
console.error('Memory leak detected:', {
growth: info.growth,
reason: info.reason,
heapDiff: info.change_in_size_over_time
});
// Alert your monitoring system
// Sentry, Datadog, etc.
});
memwatch.on('stats', (stats) => {
console.log('Heap stats:', {
currentBase: stats.current_base,
heapSize: stats.estimated_base,
minSize: stats.min,
maxSize: stats.max
});
});
Expected behavior:
- No 'leak' events during normal operation
heapSizefluctuates but doesn't trend upward- GC runs regularly (visible in
statsevents)
Load Test Verification
# 10 minute sustained load test
npx autocannon \
-c 50 \ # 50 concurrent connections
-d 600 \ # 10 minutes
-w 4 \ # 4 workers
http://localhost:3000
# Watch memory during test
watch -n 5 'ps aux | grep "node app.js" | awk "{print \$6/1024\" MB\"}"'
Success criteria:
- Memory stabilizes within first 2 minutes
- Max memory < 2x baseline memory
- No crashes or heap errors
What You Learned
- V8 heap snapshots capture exact memory state for analysis
- AI tools like Memlab detect leak patterns faster than manual analysis
- Three root causes: event listeners, timers, unbounded caches
- LRU caches prevent unbounded growth automatically
Limitations:
- AI analysis needs 3+ iterations to detect patterns reliably
- Native addons can leak outside V8 heap (invisible to these tools)
- Some "leaks" are legitimate caches that need tuning, not bugs
When NOT to use AI profiling:
- Consistent memory usage (no growth) = not a leak, just high baseline
- Memory spikes during expected operations (large file uploads)
- Short-lived processes (serverless functions under 15 minutes)
Production Checklist
Before deploying your fixes:
- Heap snapshots auto-save to persistent storage (not ephemeral containers)
- Memory alerts configured at 70% of max heap
- LRU cache limits tested under peak load
- All event listeners have corresponding
removeListenercalls - Timers are cleared in cleanup/shutdown handlers
- Monitoring dashboard shows memory trends over 7 days
Common AI Profiling Pitfalls
False Positives
// AI might flag this as a leak (it's not)
const imageCache = new Map(); // Intentional long-lived cache
// Add comment to help AI understand intent
const imageCache = new Map(); // CACHE: Intentional persistence, see docs/caching.md
Missing Leaks in Async Code
// AI struggles with leaks in promise chains
function fetchData() {
const largeBuffer = Buffer.alloc(10 * 1024 * 1024);
return fetch('/api/data')
.then(res => res.json())
.then(data => {
// largeBuffer stuck in closure even though unused
return processData(data);
});
}
// Fix: Don't capture unnecessary variables
function fetchData() {
return fetch('/api/data')
.then(res => res.json())
.then(data => {
const largeBuffer = Buffer.alloc(10 * 1024 * 1024); // Scoped properly
return processData(data);
});
}
Advanced: CI/CD Memory Regression Tests
# .github/workflows/memory-test.yml
name: Memory Leak Detection
on: [pull_request]
jobs:
memlab:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run app and Memlab
run: |
npm start &
sleep 10
memlab run --scenario ./tests/leak-scenario.js
- name: Check for leaks
run: |
memlab analyze unbound-object > report.txt
if grep -q "confidence: [8-9][0-9]%" report.txt; then
echo "❌ High-confidence leak detected"
exit 1
fi
- uses: actions/upload-artifact@v4
with:
name: memlab-report
path: report.txt
This fails CI if AI detects leaks above 80% confidence, preventing merging leaky code.
Tested on Node.js 22.x, Memlab 1.2.0, Clinic.js 13.x, Ubuntu 24.04 & macOS Sonoma