PostgreSQL Performance Tuning: Lessons from Production
Our PostgreSQL database hit a wall at 10 million rows. Queries that took 50ms started taking 5 seconds. Here's everything we did to fix it.
Understanding EXPLAIN ANALYZE
Before optimizing, understand what's slow:
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = 12345
AND created_at > '2024-01-01'
ORDER BY created_at DESC
LIMIT 10;
Look for:
- Seq Scan - Full table scan (usually bad)
- Index Scan - Using index (good)
- Bitmap Heap Scan - Multiple index conditions (usually good)
- actual time - Real execution time
- rows - Estimated vs actual row counts
Indexing Strategies
Single Column Index
-- Basic B-tree index
CREATE INDEX idx_orders_user_id ON orders(user_id);
-- Index for sorting
CREATE INDEX idx_orders_created_desc ON orders(created_at DESC);
Composite Index
Order matters! Most selective column first:
-- For: WHERE user_id = X AND status = Y
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
-- This index also helps queries on just user_id
-- But NOT queries on just status
Partial Index
When you query a subset frequently:
-- Only index active orders (80% of queries)
CREATE INDEX idx_active_orders ON orders(user_id, created_at)
WHERE status = 'active';
-- Much smaller index, faster updates
Expression Index
-- For case-insensitive email lookup
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- Query must match the expression
SELECT * FROM users WHERE LOWER(email) = '[email protected]';
Query Optimization
Avoid SELECT *
-- Bad
SELECT * FROM orders WHERE user_id = 123;
-- Good: Only fetch what you need
SELECT id, total, status, created_at FROM orders WHERE user_id = 123;
Use EXISTS Instead of IN
-- Slow with large subquery
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders WHERE total > 10000);
-- Faster
SELECT * FROM users u
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id AND o.total > 10000);
Pagination Done Right
-- Bad: OFFSET for large values
SELECT * FROM products ORDER BY id LIMIT 20 OFFSET 50000;
-- Better: Keyset pagination
SELECT * FROM products
WHERE id > 50000 -- Last seen ID
ORDER BY id
LIMIT 20;
Table Partitioning
For tables with millions of rows:
-- Create partitioned table
CREATE TABLE orders (
id SERIAL,
user_id INTEGER,
total DECIMAL(10,2),
created_at TIMESTAMPTZ
) PARTITION BY RANGE (created_at);
-- Create monthly partitions
CREATE TABLE orders_2024_01 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE orders_2024_02 PARTITION OF orders
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
PostgreSQL automatically routes queries to relevant partitions.
Configuration Tuning
Key settings in postgresql.conf:
# Memory
shared_buffers = 256MB # 25% of RAM for dedicated DB server
work_mem = 64MB # Per-operation memory for sorts
maintenance_work_mem = 512MB # For VACUUM, CREATE INDEX
# Connections
max_connections = 100
Maintenance
Regular VACUUM
-- Reclaim dead tuples
VACUUM ANALYZE orders;
-- Full vacuum (more thorough, locks table)
VACUUM FULL orders;
Check for Bloat
SELECT
schemaname, tablename,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) as total_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC;
Monitoring Queries
Find slow queries:
-- Enable pg_stat_statements extension first
SELECT query, calls, mean_time, total_time
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 20;
Database optimization is iterative. Measure, change one thing, measure again.