Problem: Your JSONB Queries Are Crushing Production Performance
You're querying JSONB columns in PostgreSQL and seeing 800ms+ response times on tables with only 100K rows. The same query on regular columns would be instant, but your flexible schema is now a performance nightmare.
You'll learn:
- Why JSONB queries are slow without proper indexes
- How to use GIN indexes and expression indexes correctly
- How AI tools can suggest optimizations you'd miss manually
- PostgreSQL 18's new features for JSONB performance
Time: 12 min | Level: Intermediate
Why JSONB Queries Are Slow
JSONB stores data in a decomposed binary format that makes it slightly slower to input, but this isn't your main problem. The real issue is that without indexes, PostgreSQL must scan every row and parse the JSONB structure to find matches.
Common symptoms:
- Queries with
data->>'field' = 'value'take seconds instead of milliseconds - EXPLAIN ANALYZE shows "Seq Scan" instead of "Index Scan"
- CPU usage spikes during JSONB queries
- Performance degrades linearly as table size grows
What's happening: PostgreSQL can't use regular B-tree indexes on JSONB columns because the data structure is too complex. You need specialized indexes that understand JSONB's internal format.
Solution: Strategic Indexing + AI Analysis
Step 1: Identify Your Slow Query
First, find which queries are actually slow using PostgreSQL's built-in tracking:
-- Enable query tracking (if not already enabled)
ALTER SYSTEM SET pg_stat_statements.track = 'all';
SELECT pg_reload_conf();
-- Find your slowest JSONB queries
SELECT
query,
calls,
mean_exec_time,
total_exec_time
FROM pg_stat_statements
WHERE query LIKE '%->%' OR query LIKE '%@>%'
ORDER BY mean_exec_time DESC
LIMIT 10;
Expected: A list showing queries with high mean_exec_time (>100ms is concerning for JSONB queries).
Let's say you find this slow query:
-- Slow: 850ms on 100K rows
SELECT * FROM events
WHERE data->>'user_id' = '12345';
Step 2: Use AI to Analyze the Problem
Instead of guessing which index to create, use AI tools to analyze your query with full context. Here's how to use Aiven's free SQL optimizer:
Prepare your context:
-- Get your table structure
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size
FROM pg_tables
WHERE tablename = 'events';
-- Get column statistics
SELECT
attname,
n_distinct,
correlation
FROM pg_stats
WHERE tablename = 'events' AND attname = 'data';
-- Get EXPLAIN output
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT * FROM events
WHERE data->>'user_id' = '12345';
Submit to AI optimizer:
- Visit https://aiven.io/tools/sql-query-optimizer
- Paste your query and EXPLAIN output
- Include table size and statistics
Why this works: AI can make intelligent recommendations based on column statistics and table characteristics, suggesting indexes that match your actual data distribution, not generic advice.
AI will likely suggest: Either a GIN index on the entire JSONB column, or an expression index on the specific field you're filtering.
Step 3: Choose the Right Index Type
PostgreSQL 18 supports several index types for JSONB. Here's when to use each:
For containment queries (@>, ?, ?|, ?&):
-- Default GIN index - supports all operators
CREATE INDEX idx_events_data_gin
ON events USING GIN (data);
-- Faster but limited - supports only @> operator
CREATE INDEX idx_events_data_pathops
ON events USING GIN (data jsonb_path_ops);
For specific field queries (->>):
-- Expression index - fastest for single field
CREATE INDEX idx_events_user_id
ON events ((data->>'user_id'));
-- Even better: extract to real column (PostgreSQL 18 feature)
ALTER TABLE events
ADD COLUMN user_id INTEGER
GENERATED ALWAYS AS ((data->>'user_id')::INTEGER) VIRTUAL;
CREATE INDEX idx_events_user_id_virtual
ON events (user_id);
Why virtual columns? In PostgreSQL 18, virtual generated columns are now the default type for generated columns. They're computed on-the-fly during reads, avoiding the write overhead of stored columns while still being indexable when marked as STORED.
For our slow query, the expression index is the fastest option:
-- This makes the query instant
CREATE INDEX idx_events_user_id
ON events ((data->>'user_id'));
Expected: Index creation takes 2-10 seconds on 100K rows. Now re-run your query:
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM events
WHERE data->>'user_id' = '12345';
You should see:
- "Index Scan using idx_events_user_id" (not Seq Scan)
- Execution time under 5ms (down from 850ms)
If it still uses Seq Scan:
- Problem: PostgreSQL thinks Seq Scan is cheaper (too many matching rows)
- Fix: The query is returning too many rows - add a LIMIT or more filters
- Verify stats: Run
ANALYZE events;to update statistics
Step 4: Optimize Complex Queries
For queries with multiple JSONB conditions, AI suggestions become crucial. Here's a real example:
-- Slow: Multiple JSONB filters
SELECT * FROM events
WHERE data->>'status' = 'active'
AND data->>'category' = 'signup'
AND (data->>'created_at')::timestamp > NOW() - INTERVAL '7 days';
AI analysis reveals: You need a composite strategy:
- GIN index for flexible containment
- Expression indexes for frequently filtered fields
- Hybrid approach with extracted columns
Optimized solution:
-- Create extracted columns for hot paths
ALTER TABLE events
ADD COLUMN status TEXT
GENERATED ALWAYS AS (data->>'status') STORED,
ADD COLUMN category TEXT
GENERATED ALWAYS AS (data->>'category') STORED,
ADD COLUMN created_at TIMESTAMP
GENERATED ALWAYS AS ((data->>'created_at')::TIMESTAMP) STORED;
-- Composite index on extracted columns
CREATE INDEX idx_events_filters
ON events (status, category, created_at DESC);
-- Keep GIN index for ad-hoc queries
CREATE INDEX idx_events_data_gin
ON events USING GIN (data jsonb_path_ops);
Why both? The hybrid approach balances flexibility with performance - extracted columns give you fast, predictable queries for known patterns, while the GIN index handles exploratory or admin queries.
New query:
-- Now uses composite index: ~2ms
SELECT * FROM events
WHERE status = 'active'
AND category = 'signup'
AND created_at > NOW() - INTERVAL '7 days';
Step 5: Leverage PostgreSQL 18's Async I/O
Postgres 18 introduces asynchronous I/O, allowing workers to optimize idle time and improve system throughput by batching reads. This is especially helpful for JSONB queries that read many rows.
No configuration needed - it's automatic. But you can monitor it:
-- Check async I/O statistics (PostgreSQL 18+)
SELECT * FROM pg_stat_io
WHERE backend_type = 'client backend';
What you'll see: Lower I/O wait times on queries that scan multiple JSONB documents, especially on NVMe storage.
Verification
Test your optimizations with realistic workload:
# Install pgbench if not available
sudo apt install postgresql-contrib
# Create test script
cat > jsonb_test.sql << 'EOF'
SELECT * FROM events WHERE data->>'user_id' = :user_id;
EOF
# Run benchmark
pgbench -c 10 -j 2 -T 60 -f jsonb_test.sql \
-D user_id=12345 your_database
You should see:
- TPS (transactions per second) increased 50-100x
- Average latency under 5ms
- No "Seq Scan" in random EXPLAIN checks
Production monitoring:
-- Track index usage
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
WHERE tablename = 'events'
ORDER BY idx_scan DESC;
Red flags:
idx_scan = 0means index isn't used - check your query- Very low
idx_tup_fetchcompared toidx_tup_readmeans poor selectivity
What You Learned
- JSONB needs specialized indexes (GIN or expression) - regular B-tree doesn't work
- AI tools analyze column statistics and data distribution to suggest optimal indexes, not just generic patterns
- Expression indexes are fastest for single field filters:
(data->>'field') - PostgreSQL 18's virtual generated columns compute values on-the-fly, avoiding write overhead
- Hybrid schemas (columns + JSONB) give you performance where it matters and flexibility where you need it
Limitations:
- GIN indexes add 20-30% write overhead - don't over-index
- Expression indexes only work for exact expressions -
data->>'field'won't usedata->'field'index - AI suggestions need accurate statistics - run
ANALYZEregularly
When NOT to use JSONB:
- Reporting/analytics on structured data (use regular columns or data warehouse)
- Frequently updated fields (JSONB updates are more expensive)
- When you need foreign key constraints on nested values
Next steps:
- Monitor index bloat with
pg_stat_user_indexes - Use
pg_stat_statementsto continuously identify slow queries - Consider partitioning large JSONB tables by a key field
- Explore PostgreSQL 18's improved B-tree skip scans for multi-column indexes
AI Tools for PostgreSQL Optimization
Free options:
- Aiven SQL Query Optimizer: https://aiven.io/tools/sql-query-optimizer (PostgreSQL + MySQL, no signup)
- PostgresAI: https://postgres.ai (schema-aware suggestions, monitors production)
- QuerySharp: https://querysharp.com (integrates with Cursor IDE)
How to use them effectively:
- Always provide EXPLAIN ANALYZE output, not just the query
- Include table statistics (
pg_stats) and sizes - Test suggestions in staging before production
- Verify the AI recommendations account for your actual hardware capacity
What AI can't do:
- Understand your business logic or access patterns
- Predict future query patterns
- Know which fields users will filter on next week
Use AI to generate candidates, but you decide which indexes to actually create based on your application's needs.
Common Pitfalls
"I created the index but it's still slow"
-- Wrong: Data type mismatch
CREATE INDEX idx_bad ON events ((data->>'user_id'));
-- Query uses integer comparison
WHERE (data->>'user_id')::INTEGER = 12345; -- Won't use index
-- Right: Match the query's cast
CREATE INDEX idx_good ON events (((data->>'user_id')::INTEGER));
"GIN index is huge"
-- Check index size
SELECT
pg_size_pretty(pg_relation_size('idx_events_data_gin')) as index_size,
pg_size_pretty(pg_relation_size('events')) as table_size;
GIN indexes can be 50-100% of table size. If you always filter by a specific field, consider extracting it to a regular column instead of indexing the entire JSONB.
"AI suggested 10 indexes"
Don't create them all. Each index adds write overhead. Priority:
- Indexes on WHERE clause fields (biggest impact)
- Indexes on JOIN conditions
- Indexes on ORDER BY fields (only if frequently used)
Start with 1-2 indexes, measure impact, then add more only if needed.
Tested on PostgreSQL 18.2, 1M row dataset, Ubuntu 24.04 LTS with NVMe storage