PostgreSQL Performance Tuning: Lessons from Production

Our PostgreSQL database hit a wall at 10 million rows. Queries that took 50ms started taking 5 seconds. Here's everything we did to fix it.

Understanding EXPLAIN ANALYZE

Before optimizing, understand what's slow:

EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = 12345
AND created_at > '2024-01-01'
ORDER BY created_at DESC
LIMIT 10;

Look for:

Seq Scan - Full table scan (usually bad)
Index Scan - Using index (good)
Bitmap Heap Scan - Multiple index conditions (usually good)
actual time - Real execution time
rows - Estimated vs actual row counts

Indexing Strategies

Single Column Index

-- Basic B-tree index
CREATE INDEX idx_orders_user_id ON orders(user_id);

-- Index for sorting
CREATE INDEX idx_orders_created_desc ON orders(created_at DESC);

Composite Index

Order matters! Most selective column first:

-- For: WHERE user_id = X AND status = Y
CREATE INDEX idx_orders_user_status ON orders(user_id, status);

-- This index also helps queries on just user_id
-- But NOT queries on just status

Partial Index

When you query a subset frequently:

-- Only index active orders (80% of queries)
CREATE INDEX idx_active_orders ON orders(user_id, created_at)
WHERE status = 'active';

-- Much smaller index, faster updates

Expression Index

-- For case-insensitive email lookup
CREATE INDEX idx_users_email_lower ON users(LOWER(email));

-- Query must match the expression
SELECT * FROM users WHERE LOWER(email) = '[email protected]';

Query Optimization

Avoid SELECT *

-- Bad
SELECT * FROM orders WHERE user_id = 123;

-- Good: Only fetch what you need
SELECT id, total, status, created_at FROM orders WHERE user_id = 123;

Use EXISTS Instead of IN

-- Slow with large subquery
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders WHERE total > 10000);

-- Faster
SELECT * FROM users u
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id AND o.total > 10000);

Pagination Done Right

-- Bad: OFFSET for large values
SELECT * FROM products ORDER BY id LIMIT 20 OFFSET 50000;

-- Better: Keyset pagination
SELECT * FROM products
WHERE id > 50000  -- Last seen ID
ORDER BY id
LIMIT 20;

Table Partitioning

For tables with millions of rows:

-- Create partitioned table
CREATE TABLE orders (
    id SERIAL,
    user_id INTEGER,
    total DECIMAL(10,2),
    created_at TIMESTAMPTZ
) PARTITION BY RANGE (created_at);

-- Create monthly partitions
CREATE TABLE orders_2024_01 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE orders_2024_02 PARTITION OF orders
    FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

PostgreSQL automatically routes queries to relevant partitions.

Configuration Tuning

Key settings in postgresql.conf:

# Memory
shared_buffers = 256MB  # 25% of RAM for dedicated DB server
work_mem = 64MB         # Per-operation memory for sorts
maintenance_work_mem = 512MB  # For VACUUM, CREATE INDEX

# Connections
max_connections = 100

Maintenance

Regular VACUUM

-- Reclaim dead tuples
VACUUM ANALYZE orders;

-- Full vacuum (more thorough, locks table)
VACUUM FULL orders;

Check for Bloat

SELECT
    schemaname, tablename,
    pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) as total_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC;

Monitoring Queries

Find slow queries:

-- Enable pg_stat_statements extension first
SELECT query, calls, mean_time, total_time
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 20;

Database optimization is iterative. Measure, change one thing, measure again.