Problem: Your App is Slow But You Don't Know Where
Your production service has 2-second response times but traditional logs show nothing. You need to find which functions are consuming CPU without drowning in raw profiling data.
You'll learn:
- How to capture production-safe flame graphs
- What flame graph patterns reveal performance issues
- Using AI to analyze flame graphs faster than manual inspection
Time: 15 min | Level: Intermediate
Why This Happens
Traditional logging misses CPU-bound issues because logs show what happened, not where time was spent. Flame graphs visualize the call stack hierarchy and time distribution across your entire codebase.
Common symptoms:
- High CPU usage but unclear cause
- Slow endpoints with no obvious N+1 queries
- Performance degradation under load
- "Everything looks fine" in application logs
Solution
Step 1: Capture a Flame Graph
For Node.js/Bun:
# Install profiling tools
npm install -g 0x clinic
# Profile your running application
0x --collect-only node server.js
# Run load test, then Ctrl+C
0x --visualize-only profile-*/
# Generates flamegraph.html
For Python:
pip install py-spy --break-system-packages
# Profile running process without restarting
sudo py-spy record -o profile.svg --pid $(pgrep -f "python api.py")
# Wait 30-60 seconds, then Ctrl+C
For Rust:
# Add to Cargo.toml
cargo flamegraph --bin your-app
# Run workload, then stop
# Generates flamegraph.svg
Expected: SVG or HTML file showing colored stack visualization
If it fails:
- Permission denied: Use
sudofor py-spy or enable perf_event_paranoid - No symbols: Build with debug symbols (
RUSTFLAGS=-gor Node.js--perf-basic-prof)
Step 2: Read the Flame Graph Basics
Structure:
- Width = Time spent (wider = more CPU usage)
- Height = Call stack depth (top = leaf functions doing work)
- Color = Library or module (grouping, not performance)
Key patterns to spot:
┌────────────────────────────────────┐ ← Entire program runtime
│ main() │
├───────────────┬────────────────────┤
│ handleRequest │ dbQuery() ██████│ ← Wide box = CPU hotspot
├───────┬───────┼──────┬─────────────┤
│ parse │ validateAuth │ SELECT │
└───────┴──────┴───────┴─────────────┘
What this reveals:
- Wide plateaus = CPU bottlenecks (function consuming most time)
- Tall towers = Deep call stacks (often recursion or callback hell)
- Repeated patterns = Opportunities to cache (same function called many times)
Step 3: Use AI to Analyze the Flame Graph
Save your flame graph as an image, then use an AI tool:
Using Claude with screenshot:
Upload flame graph PNG and prompt:
"Analyze this flame graph from my Node.js API. Current issue:
/api/users endpoint takes 2.1s under 100 req/s load.
Framework: Express + Prisma ORM. What's the bottleneck?"
AI will identify:
- Specific function names consuming disproportionate time
- Unexpected library calls (e.g., excessive JSON serialization)
- Anti-patterns (sync crypto in async handlers)
- Comparison with typical patterns for your stack
Example AI insight:
"The widest section shows
bcrypt.hashSync()taking 68% of CPU time in the request handler. This is blocking the event loop. Solution: Usebcrypt.hash()async version or move to worker threads."
Step 4: Fix the Identified Bottleneck
Based on AI analysis, apply targeted fixes:
Common fixes by pattern:
Pattern: Wide bcrypt/crypto blocks
// Before (blocking)
const hash = bcrypt.hashSync(password, 10);
// After (non-blocking)
const hash = await bcrypt.hash(password, 10);
// This fixes event loop blocking in Node.js
Pattern: JSON.stringify dominating
# Before (serializing entire object)
return jsonify(user.__dict__)
# After (select only needed fields)
return jsonify({"id": user.id, "name": user.name})
# Reduces serialization time by 80% for large models
Pattern: Repeated database queries
// Before (N+1 query pattern)
for (const order of orders) {
order.customer = await db.customer.findUnique({ id: order.customerId });
}
// After (batch with include)
const orders = await db.order.findMany({
include: { customer: true }
});
// Changes 100 queries to 1 query with join
Step 5: Verify the Improvement
Re-profile with same load:
# Capture new flame graph with same test
0x node server.js
# Run identical load test
Compare:
- Original bottleneck should be narrower or absent
- Overall graph width should be similar (same total time)
- Verify response time improvement in metrics
You should see:
- Target function reduced from 60%+ to <20% width
- 2-5x faster endpoint response times
- Lower CPU usage at same throughput
Verification
Measure impact:
# Before and after comparison
curl -w "@curl-format.txt" https://api.example.com/users
# curl-format.txt:
time_total: %{time_total}s
You should see: Response time reduced by 40-70% for CPU-bound endpoints
Production validation:
- Monitor P95 latency for 24 hours
- Check CPU utilization drops proportionally
- No increase in error rates
What You Learned
- Flame graphs show time distribution across your call stack
- Width matters more than height for finding bottlenecks
- AI can identify specific anti-patterns faster than manual analysis
- Fix the widest blocks first for maximum impact
Limitations:
- Flame graphs show CPU time, not I/O wait time (use async profilers for that)
- Profiling adds 5-15% overhead (safe for production but affects results)
- AI analysis quality depends on providing context (framework, known issues)
When NOT to use flame graphs:
- Database query optimization (use query analyzers instead)
- Memory leaks (use heap snapshots)
- Network latency issues (use distributed tracing)
Advanced: Differential Flame Graphs
Compare two profiles to see what changed:
# Capture baseline
py-spy record -o baseline.svg --pid $PID
# After code change
py-spy record -o after.svg --pid $PID
# Generate diff (requires flamegraph.pl)
git clone https://github.com/brendangregg/FlameGraph
./FlameGraph/difffolded.pl baseline.folded after.folded | \
./FlameGraph/flamegraph.pl > diff.svg
Diff colors:
- Red = Regressed (function got slower)
- Blue = Improved (function got faster)
- Gray = Unchanged
Upload the diff flame graph to AI:
"This diff flame graph shows before/after my optimization.
The red sections grew after I added Redis caching. Why?"
AI might identify: "The Redis client serialization (red) now exceeds the time saved from skipping database queries (blue). Consider using msgpack instead of JSON for cache values."
Flame Graph Cheat Sheet
Quick pattern recognition:
| Visual Pattern | Problem | Typical Fix |
|---|---|---|
| One wide plateau at top | Single bottleneck | Optimize that function |
| Many thin towers | Excessive recursion | Add memoization or iterative approach |
| Repeated identical patterns | N+1 queries or loops | Batch operations |
| Wide blocks in library code | Framework misconfiguration | Check docs for async/streaming APIs |
| Deep stacks (>20 levels) | Callback hell or deep recursion | Refactor to iterative or flatten promises |
Tools mentioned:
- 0x - Node.js profiler
- py-spy - Python profiler
- cargo-flamegraph - Rust profiler
- Brendan Gregg's FlameGraph - Original implementation
Tested on Node.js 22.x, Python 3.12, Rust 1.76, Linux & macOS