Last month, our main API endpoint went from 2.3s to 89ms response time. The fix? Proper database indexing. Here's the step-by-step process we used.
The Problem: A Real Slow Query
Our user dashboard was timing out. The culprit query:
12345678910SELECT u.*, p.title, p.created_at, COUNT(l.id) as likesFROM users uJOIN posts p ON u.id = p.user_idLEFT JOIN likes l ON p.id = l.post_idWHERE u.created_at > '2024-01-01'AND p.status = 'published'AND u.location LIKE '%New York%'GROUP BY u.id, p.idORDER BY p.created_at DESCLIMIT 20;
Execution time: 2,340ms on 500k users, 2M posts
Step 1: Analyze the Query Plan
Always start with EXPLAIN ANALYZE
:
12345678910EXPLAIN ANALYZE SELECT u.*, p.title, p.created_at, COUNT(l.id) as likesFROM users uJOIN posts p ON u.id = p.user_idLEFT JOIN likes l ON p.id = l.post_idWHERE u.created_at > '2024-01-01'AND p.status = 'published'AND u.location LIKE '%New York%'GROUP BY u.id, p.idORDER BY p.created_at DESCLIMIT 20;
The problem: Sequential scans on all three tables. No indexes were being used.
Step 2: Add Strategic Indexes
1. Compound Index for Users
123-- Index for the WHERE clause filtersCREATE INDEX idx_users_created_location ON users(created_at, location);-- Why this order? created_at is more selective than LIKE '%pattern%'
2. Compound Index for Posts
123-- Index for JOIN and WHERE conditionsCREATE INDEX idx_posts_user_status_created ON posts(user_id, status, created_at DESC);-- This covers the JOIN, WHERE, and ORDER BY in one index
3. Simple Index for Likes
12-- For the LEFT JOINCREATE INDEX idx_likes_post_id ON likes(post_id);
Step 3: Results After Indexing
Same query, new execution time: 89ms (96% improvement)
1234-- Query plan now shows:-- Index Scan using idx_users_created_location-- Nested Loop Join-- Index Scan using idx_posts_user_status_created
Index Design Principles
1. Column Order Matters
For compound indexes, put the most selective columns first:
123456-- Good: Selective column firstCREATE INDEX idx_orders_date_status ON orders(created_at, status);-- Bad: Less selective column firstCREATE INDEX idx_orders_status_date ON orders(status, created_at);-- Query: WHERE created_at > '2025-01-01' AND status = 'pending'-- The first index is much more efficient
2. Include Columns for Covering Indexes
PostgreSQL and SQL Server support INCLUDE columns:
12345678-- Covering index - no table lookup neededCREATE INDEX idx_posts_covering ON posts(user_id, status) INCLUDE (title, created_at, content);-- This query uses only the index:SELECT title, created_at, content FROM posts WHERE user_id = 123 AND status = 'published';
Common Indexing Mistakes
1. Over-Indexing
Don't create an index for every column:
123456789-- Bad: Too many single-column indexesCREATE INDEX idx_users_email ON users(email);CREATE INDEX idx_users_name ON users(name);CREATE INDEX idx_users_created ON users(created_at);CREATE INDEX idx_users_status ON users(status);-- Better: Strategic compound indexesCREATE INDEX idx_users_email ON users(email); -- For loginCREATE INDEX idx_users_name_status ON users(name, status); -- For searchCREATE INDEX idx_users_created_status ON users(created_at, status); -- For filtering
2. Ignoring String Patterns
LIKE queries need special consideration:
12345678-- This WON'T use a regular index:WHERE email LIKE '%@gmail.com'-- This WILL use an index:WHERE email LIKE 'john%'-- For suffix searches, consider:-- 1. Full-text search indexes-- 2. Reverse string indexes-- 3. Trigram indexes (PostgreSQL)
Monitoring Index Performance
PostgreSQL: Check Index Usage
12345-- Find unused indexesSELECT schemaname, tablename, indexname, idx_scanFROM pg_stat_user_indexesWHERE idx_scan = 0ORDER BY schemaname, tablename, indexname;
MySQL: Index Efficiency
12345678910-- Check index cardinalitySELECT TABLE_NAME,INDEX_NAME,CARDINALITY,SUB_PART,NULLABLEFROM INFORMATION_SCHEMA.STATISTICSWHERE TABLE_SCHEMA = 'your_database'ORDER BY TABLE_NAME, SEQ_IN_INDEX;
Real-World Index Maintenance
1. Regular ANALYZE/UPDATE STATISTICS
123456-- PostgreSQL: Update query planner statisticsANALYZE users;ANALYZE posts;-- SQL Server: Update statisticsUPDATE STATISTICS users;UPDATE STATISTICS posts;
2. Monitor Index Bloat
1234567891011-- PostgreSQL: Check index bloatSELECT schemaname,tablename,indexname,pg_size_pretty(pg_relation_size(indexrelid)) as index_size,idx_scan,idx_tup_read,idx_tup_fetchFROM pg_stat_user_indexesORDER BY pg_relation_size(indexrelid) DESC;
Performance Testing Script
We use this Node.js script to test query performance:
1234567891011121314151617181920const { Pool } = require('pg');const pool = new Pool({ connectionString: process.env.DATABASE_URL });async function testQuery(name, query) {const start = Date.now();try {const result = await pool.query(query);const duration = Date.now() - start;console.log(`${name}: ${duration}ms (${result.rows.length} rows)`);return { duration, rows: result.rows.length };} catch (error) {console.error(`${name} failed:`, error.message);}}// Test before and after indexingasync function runTests() {await testQuery('Dashboard Query', dashboardQuery);await testQuery('Search Query', searchQuery);await testQuery('Reports Query', reportsQuery);}runTests();
Key Takeaways
- Always use
EXPLAIN ANALYZE
to understand query execution - Design compound indexes with column order in mind
- Monitor index usage and remove unused indexes
- Test performance before and after changes
- Consider covering indexes for frequently accessed columns
Proper indexing isn't just about adding indexes—it's about understanding your queries and designing indexes that support them efficiently.