Skip to main content

SQL Optimization

Optimize SQL query performance across PostgreSQL, MySQL, and SQL Server through execution plan analysis, strategic indexing, and query rewriting.

When to Use

Use when:

  • Debugging slow queries or database timeouts
  • Analyzing EXPLAIN plans or execution plans
  • Determining index requirements
  • Rewriting inefficient queries
  • Identifying query anti-patterns (N+1, SELECT *, correlated subqueries)
  • Database-specific optimization needs (PostgreSQL, MySQL, SQL Server)

Key Features

Core Optimization Workflow

  1. Analyze Query Performance: Run EXPLAIN ANALYZE to identify bottlenecks
  2. Identify Opportunities: Look for sequential scans, high row counts, inefficient joins
  3. Apply Indexing Strategies: Add strategic indexes based on WHERE, JOIN, ORDER BY
  4. Rewrite Queries: Eliminate anti-patterns and optimize query structure
  5. Verify Improvement: Re-run EXPLAIN and compare performance

Common Red Flags

IndicatorProblemSolution
Seq Scan / Table ScanFull table scan on large tableAdd index on filter columns
High row countProcessing excessive rowsAdd WHERE filter or index
Nested Loop with large outerInefficient join algorithmIndex join columns
Correlated subquerySubquery executes per rowRewrite as JOIN or EXISTS
Sort on large result setExpensive sortingAdd index matching ORDER BY

Quick Start

Analyze Query Performance

PostgreSQL:

EXPLAIN ANALYZE
SELECT * FROM orders
WHERE customer_id = 123
ORDER BY created_at DESC
LIMIT 10;

Output Analysis:

Seq Scan on orders (cost=0.00..2500.00 rows=10)
Filter: (customer_id = 123)
Rows Removed by Filter: 99990

Problem: Sequential scan filtering 99,990 rows

Add Composite Index

CREATE INDEX idx_orders_customer_created
ON orders (customer_id, created_at DESC);

Verify Improvement

EXPLAIN ANALYZE
SELECT * FROM orders
WHERE customer_id = 123
ORDER BY created_at DESC
LIMIT 10;

Result: 200x faster (2000ms → 10ms)

Index Scan using idx_orders_customer_created (cost=0.42..12.44 rows=10)
Index Cond: (customer_id = 123)

Indexing Strategies

Index Decision Framework

Is column used in WHERE, JOIN, ORDER BY, or GROUP BY?
├─ YES → Is column selective (many unique values)?
│ ├─ YES → Is table frequently queried?
│ │ ├─ YES → ADD INDEX
│ │ └─ NO → Consider based on query frequency
│ └─ NO (low selectivity) → Skip index
└─ NO → Skip index

Composite Index Design

Column Order Matters:

  1. Equality filters first (most selective)
  2. Additional equality filters (by selectivity)
  3. Range filters or ORDER BY (last)

Example:

-- Query pattern
SELECT * FROM orders
WHERE customer_id = 123 AND status = 'shipped'
ORDER BY created_at DESC
LIMIT 10;

-- Optimal composite index
CREATE INDEX idx_orders_customer_status_created
ON orders (customer_id, status, created_at DESC);

Index Types by Use Case

PostgreSQL:

  • B-tree (default): General-purpose, supports <, ≤, =, ≥, >, BETWEEN, IN
  • GIN: Full-text search, JSONB, arrays
  • GiST: Spatial data, geometric types
  • BRIN: Very large tables with naturally ordered data

MySQL:

  • B-tree (default): General-purpose index
  • Full-text: Text search on VARCHAR/TEXT columns
  • Spatial: Spatial data types

Query Anti-Patterns

1. SELECT * (Over-fetching)

-- ❌ Bad: Fetches all columns
SELECT * FROM users WHERE id = 1;

-- ✅ Good: Fetch only needed columns
SELECT id, name, email FROM users WHERE id = 1;

2. N+1 Queries

-- ❌ Bad: 1 + N queries
SELECT * FROM users LIMIT 100;
-- Then in loop: SELECT * FROM posts WHERE user_id = ?;

-- ✅ Good: Single JOIN
SELECT users.*, posts.id AS post_id, posts.title
FROM users
LEFT JOIN posts ON users.id = posts.user_id;

3. Non-Sargable Queries

Functions on indexed columns prevent index usage:

-- ❌ Bad: Function prevents index usage
SELECT * FROM orders WHERE YEAR(created_at) = 2025;

-- ✅ Good: Sargable range condition
SELECT * FROM orders
WHERE created_at >= '2025-01-01'
AND created_at < '2026-01-01';

4. Correlated Subqueries

-- ❌ Bad: Subquery executes per row
SELECT name,
(SELECT COUNT(*) FROM orders WHERE orders.user_id = users.id)
FROM users;

-- ✅ Good: JOIN with GROUP BY
SELECT users.name, COUNT(orders.id) AS order_count
FROM users
LEFT JOIN orders ON users.id = orders.user_id
GROUP BY users.id, users.name;

Database-Specific Optimizations

PostgreSQL-Specific Features

Partial Indexes (index subset of rows):

CREATE INDEX idx_active_users_login
ON users (last_login)
WHERE status = 'active';

Expression Indexes (index computed values):

CREATE INDEX idx_users_email_lower
ON users (LOWER(email));

Covering Indexes (avoid heap access):

CREATE INDEX idx_users_email_covering
ON users (email) INCLUDE (id, name);

MySQL-Specific Features

Index Hints (override optimizer):

SELECT * FROM orders USE INDEX (idx_orders_customer)
WHERE customer_id = 123;

SQL Server-Specific Features

Query Store (track query performance over time):

ALTER DATABASE YourDatabase SET QUERY_STORE = ON;

Advanced Techniques

EXISTS vs IN for Subqueries

Use EXISTS for better performance with large datasets:

-- ✅ Good: EXISTS stops at first match
SELECT * FROM users
WHERE EXISTS (SELECT 1 FROM orders WHERE orders.user_id = users.id);

-- ❌ Less efficient: IN builds full list
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders);

Common Table Expressions (CTEs)

Break complex queries into readable parts:

WITH active_customers AS (
SELECT id, name FROM customers WHERE status = 'active'
),
recent_orders AS (
SELECT customer_id, COUNT(*) as order_count
FROM orders
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY customer_id
)
SELECT ac.name, COALESCE(ro.order_count, 0) as orders
FROM active_customers ac
LEFT JOIN recent_orders ro ON ac.id = ro.customer_id;

Monitoring and Maintenance

Regular Optimization Tasks:

  • Review slow query logs weekly
  • Update database statistics regularly (ANALYZE in PostgreSQL)
  • Monitor index usage (drop unused indexes)
  • Archive old data to keep tables manageable
  • Review execution plans for critical queries quarterly

PostgreSQL Statistics Update:

ANALYZE table_name;

MySQL Statistics Update:

ANALYZE TABLE table_name;

References