Optimize PostgreSQL 18 JSONB Queries in 12 Minutes with AI

Problem: Your JSONB Queries Are Crushing Production Performance

You're querying JSONB columns in PostgreSQL and seeing 800ms+ response times on tables with only 100K rows. The same query on regular columns would be instant, but your flexible schema is now a performance nightmare.

You'll learn:

Why JSONB queries are slow without proper indexes
How to use GIN indexes and expression indexes correctly
How AI tools can suggest optimizations you'd miss manually
PostgreSQL 18's new features for JSONB performance

Time: 12 min | Level: Intermediate

Why JSONB Queries Are Slow

JSONB stores data in a decomposed binary format that makes it slightly slower to input, but this isn't your main problem. The real issue is that without indexes, PostgreSQL must scan every row and parse the JSONB structure to find matches.

Common symptoms:

Queries with data->>'field' = 'value' take seconds instead of milliseconds
EXPLAIN ANALYZE shows "Seq Scan" instead of "Index Scan"
CPU usage spikes during JSONB queries
Performance degrades linearly as table size grows

What's happening: PostgreSQL can't use regular B-tree indexes on JSONB columns because the data structure is too complex. You need specialized indexes that understand JSONB's internal format.

Solution: Strategic Indexing + AI Analysis

Step 1: Identify Your Slow Query

First, find which queries are actually slow using PostgreSQL's built-in tracking:

-- Enable query tracking (if not already enabled)
ALTER SYSTEM SET pg_stat_statements.track = 'all';
SELECT pg_reload_conf();

-- Find your slowest JSONB queries
SELECT 
    query,
    calls,
    mean_exec_time,
    total_exec_time
FROM pg_stat_statements
WHERE query LIKE '%->%' OR query LIKE '%@>%'
ORDER BY mean_exec_time DESC
LIMIT 10;

Expected: A list showing queries with high mean_exec_time (>100ms is concerning for JSONB queries).

Let's say you find this slow query:

-- Slow: 850ms on 100K rows
SELECT * FROM events 
WHERE data->>'user_id' = '12345';

Step 2: Use AI to Analyze the Problem

Instead of guessing which index to create, use AI tools to analyze your query with full context. Here's how to use Aiven's free SQL optimizer:

Prepare your context:

-- Get your table structure
SELECT 
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size
FROM pg_tables 
WHERE tablename = 'events';

-- Get column statistics
SELECT 
    attname,
    n_distinct,
    correlation
FROM pg_stats 
WHERE tablename = 'events' AND attname = 'data';

-- Get EXPLAIN output
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT * FROM events 
WHERE data->>'user_id' = '12345';

Submit to AI optimizer:

Visit https://aiven.io/tools/sql-query-optimizer
Paste your query and EXPLAIN output
Include table size and statistics

Why this works: AI can make intelligent recommendations based on column statistics and table characteristics, suggesting indexes that match your actual data distribution, not generic advice.

AI will likely suggest: Either a GIN index on the entire JSONB column, or an expression index on the specific field you're filtering.

Step 3: Choose the Right Index Type

PostgreSQL 18 supports several index types for JSONB. Here's when to use each:

For containment queries (@>, ?, ?|, ?&):

-- Default GIN index - supports all operators
CREATE INDEX idx_events_data_gin 
ON events USING GIN (data);

-- Faster but limited - supports only @> operator
CREATE INDEX idx_events_data_pathops 
ON events USING GIN (data jsonb_path_ops);

For specific field queries (->>):

-- Expression index - fastest for single field
CREATE INDEX idx_events_user_id 
ON events ((data->>'user_id'));

-- Even better: extract to real column (PostgreSQL 18 feature)
ALTER TABLE events 
ADD COLUMN user_id INTEGER 
GENERATED ALWAYS AS ((data->>'user_id')::INTEGER) VIRTUAL;

CREATE INDEX idx_events_user_id_virtual 
ON events (user_id);

Why virtual columns? In PostgreSQL 18, virtual generated columns are now the default type for generated columns. They're computed on-the-fly during reads, avoiding the write overhead of stored columns while still being indexable when marked as STORED.

For our slow query, the expression index is the fastest option:

-- This makes the query instant
CREATE INDEX idx_events_user_id 
ON events ((data->>'user_id'));

Expected: Index creation takes 2-10 seconds on 100K rows. Now re-run your query:

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM events 
WHERE data->>'user_id' = '12345';

You should see:

"Index Scan using idx_events_user_id" (not Seq Scan)
Execution time under 5ms (down from 850ms)

If it still uses Seq Scan:

Problem: PostgreSQL thinks Seq Scan is cheaper (too many matching rows)
Fix: The query is returning too many rows - add a LIMIT or more filters
Verify stats: Run ANALYZE events; to update statistics

Step 4: Optimize Complex Queries

For queries with multiple JSONB conditions, AI suggestions become crucial. Here's a real example:

-- Slow: Multiple JSONB filters
SELECT * FROM events
WHERE data->>'status' = 'active'
  AND data->>'category' = 'signup'
  AND (data->>'created_at')::timestamp > NOW() - INTERVAL '7 days';

AI analysis reveals: You need a composite strategy:

GIN index for flexible containment
Expression indexes for frequently filtered fields
Hybrid approach with extracted columns

Optimized solution:

-- Create extracted columns for hot paths
ALTER TABLE events
ADD COLUMN status TEXT 
GENERATED ALWAYS AS (data->>'status') STORED,
ADD COLUMN category TEXT 
GENERATED ALWAYS AS (data->>'category') STORED,
ADD COLUMN created_at TIMESTAMP 
GENERATED ALWAYS AS ((data->>'created_at')::TIMESTAMP) STORED;

-- Composite index on extracted columns
CREATE INDEX idx_events_filters 
ON events (status, category, created_at DESC);

-- Keep GIN index for ad-hoc queries
CREATE INDEX idx_events_data_gin 
ON events USING GIN (data jsonb_path_ops);

Why both? The hybrid approach balances flexibility with performance - extracted columns give you fast, predictable queries for known patterns, while the GIN index handles exploratory or admin queries.

New query:

-- Now uses composite index: ~2ms
SELECT * FROM events
WHERE status = 'active'
  AND category = 'signup'
  AND created_at > NOW() - INTERVAL '7 days';

Step 5: Leverage PostgreSQL 18's Async I/O

Postgres 18 introduces asynchronous I/O, allowing workers to optimize idle time and improve system throughput by batching reads. This is especially helpful for JSONB queries that read many rows.

No configuration needed - it's automatic. But you can monitor it:

-- Check async I/O statistics (PostgreSQL 18+)
SELECT * FROM pg_stat_io
WHERE backend_type = 'client backend';

What you'll see: Lower I/O wait times on queries that scan multiple JSONB documents, especially on NVMe storage.

Verification

Test your optimizations with realistic workload:

# Install pgbench if not available
sudo apt install postgresql-contrib

# Create test script
cat > jsonb_test.sql << 'EOF'
SELECT * FROM events WHERE data->>'user_id' = :user_id;
EOF

# Run benchmark
pgbench -c 10 -j 2 -T 60 -f jsonb_test.sql \
  -D user_id=12345 your_database

You should see:

TPS (transactions per second) increased 50-100x
Average latency under 5ms
No "Seq Scan" in random EXPLAIN checks

Production monitoring:

-- Track index usage
SELECT 
    schemaname,
    tablename,
    indexname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch
FROM pg_stat_user_indexes
WHERE tablename = 'events'
ORDER BY idx_scan DESC;

Red flags:

idx_scan = 0 means index isn't used - check your query
Very low idx_tup_fetch compared to idx_tup_read means poor selectivity

What You Learned

JSONB needs specialized indexes (GIN or expression) - regular B-tree doesn't work
AI tools analyze column statistics and data distribution to suggest optimal indexes, not just generic patterns
Expression indexes are fastest for single field filters: (data->>'field')
PostgreSQL 18's virtual generated columns compute values on-the-fly, avoiding write overhead
Hybrid schemas (columns + JSONB) give you performance where it matters and flexibility where you need it

Limitations:

GIN indexes add 20-30% write overhead - don't over-index
Expression indexes only work for exact expressions - data->>'field' won't use data->'field' index
AI suggestions need accurate statistics - run ANALYZE regularly

When NOT to use JSONB:

Reporting/analytics on structured data (use regular columns or data warehouse)
Frequently updated fields (JSONB updates are more expensive)
When you need foreign key constraints on nested values

Next steps:

Monitor index bloat with pg_stat_user_indexes
Use pg_stat_statements to continuously identify slow queries
Consider partitioning large JSONB tables by a key field
Explore PostgreSQL 18's improved B-tree skip scans for multi-column indexes

AI Tools for PostgreSQL Optimization

Free options:

Aiven SQL Query Optimizer: https://aiven.io/tools/sql-query-optimizer (PostgreSQL + MySQL, no signup)
PostgresAI: https://postgres.ai (schema-aware suggestions, monitors production)
QuerySharp: https://querysharp.com (integrates with Cursor IDE)

How to use them effectively:

Always provide EXPLAIN ANALYZE output, not just the query
Include table statistics (pg_stats) and sizes
Test suggestions in staging before production
Verify the AI recommendations account for your actual hardware capacity

What AI can't do:

Understand your business logic or access patterns
Predict future query patterns
Know which fields users will filter on next week

Use AI to generate candidates, but you decide which indexes to actually create based on your application's needs.

Common Pitfalls

"I created the index but it's still slow"

-- Wrong: Data type mismatch
CREATE INDEX idx_bad ON events ((data->>'user_id'));
-- Query uses integer comparison
WHERE (data->>'user_id')::INTEGER = 12345;  -- Won't use index

-- Right: Match the query's cast
CREATE INDEX idx_good ON events (((data->>'user_id')::INTEGER));

"GIN index is huge"

-- Check index size
SELECT 
    pg_size_pretty(pg_relation_size('idx_events_data_gin')) as index_size,
    pg_size_pretty(pg_relation_size('events')) as table_size;

GIN indexes can be 50-100% of table size. If you always filter by a specific field, consider extracting it to a regular column instead of indexing the entire JSONB.

"AI suggested 10 indexes"

Don't create them all. Each index adds write overhead. Priority:

Indexes on WHERE clause fields (biggest impact)
Indexes on JOIN conditions
Indexes on ORDER BY fields (only if frequently used)

Start with 1-2 indexes, measure impact, then add more only if needed.

Tested on PostgreSQL 18.2, 1M row dataset, Ubuntu 24.04 LTS with NVMe storage