A 40-Line Fix That Eliminated a 400x Performance Gap
A 40-Line Fix That Eliminated a 400x Performance Gap
Last month, I found a query that took 2.8 seconds to run. After a 40-line fix, it completed in 7 milliseconds. Same result, 400x faster. Here’s how.
The Problem
We had a dashboard showing user activity. Simple enough:
SELECT
users.name,
COUNT(events.id) as event_count,
MAX(events.created_at) as last_active
FROM users
LEFT JOIN events ON events.user_id = users.id
WHERE users.created_at > NOW() - INTERVAL '30 days'
GROUP BY users.id
ORDER BY event_count DESC
LIMIT 100;
For 50,000 users and 10 million events: 2,800ms.
The kicker? Only 100 rows needed. Yet we were processing millions.
The Investigation
EXPLAIN ANALYZE ...
Limit (cost=45678.23..45678.48 rows=100 width=52)
(actual time=2847.234..2847.289 rows=100 loops=1)
-> Sort (cost=45678.23..45803.41 rows=50072 width=52)
(actual time=2847.231..2847.258 rows=100 loops=1)
Sort Key: (count(events.id)) DESC
-> HashAggregate (cost=41234.56..41735.28 rows=50072 width=52)
(actual time=2789.123..2820.456 rows=50072 loops=1)
-> Hash Left Join (cost=1234.45..38901.23 rows=9827456 width=28)
(actual time=45.678..2234.567 rows=9827456 loops=1)
The problem: Hash Left Join of 10 million event rows.
We’re joining EVERYTHING, then filtering, then sorting, then limiting. Backwards.
The Fix
Flip the logic: get the top users first, THEN join their events.
WITH top_users AS (
SELECT id, name
FROM users
WHERE created_at > NOW() - INTERVAL '30 days'
LIMIT 100
),
user_events AS (
SELECT
user_id,
COUNT(*) as event_count,
MAX(created_at) as last_active
FROM events
WHERE user_id IN (SELECT id FROM top_users)
GROUP BY user_id
)
SELECT
u.name,
COALESCE(e.event_count, 0) as event_count,
e.last_active
FROM top_users u
LEFT JOIN user_events e ON e.user_id = u.id
ORDER BY event_count DESC;
Wait, this is wrong. We’re limiting users BEFORE counting events. We need users with the MOST events, but we don’t know that until after counting.
Let me try again:
WITH user_event_counts AS (
SELECT
user_id,
COUNT(*) as event_count,
MAX(created_at) as last_active
FROM events
WHERE user_id IN (
SELECT id
FROM users
WHERE created_at > NOW() - INTERVAL '30 days'
)
GROUP BY user_id
ORDER BY event_count DESC
LIMIT 100
)
SELECT
u.name,
e.event_count,
e.last_active
FROM user_event_counts e
JOIN users u ON u.id = e.user_id
ORDER BY e.event_count DESC;
Still wrong. The subquery still processes all events.
The REAL fix:
-- Step 1: Pre-aggregate events per user
CREATE INDEX idx_events_user_created
ON events(user_id, created_at);
-- Step 2: Rewrite query to use index
SELECT
users.name,
event_counts.count as event_count,
event_counts.last_active
FROM users
JOIN LATERAL (
SELECT
COUNT(*) as count,
MAX(created_at) as last_active
FROM events
WHERE events.user_id = users.id
) event_counts ON true
WHERE users.created_at > NOW() - INTERVAL '30 days'
ORDER BY event_counts.count DESC
LIMIT 100;
Result: 7ms.
What Changed?
Before:
- Load ALL users (50k rows)
- Join with ALL events (10M rows)
- Aggregate (process 10M rows)
- Sort (50k results)
- Limit to 100
After:
- Scan users with index
- For each user, count their events using index (LATERAL join)
- Keep running top-100 in memory
- Stop after finding top 100
The LATERAL join + smart ordering meant Postgres only processes ~500 users to find the top 100, not all 50k.
The Lesson
Don’t process what you don’t need.
Common antipatterns:
SELECT *when you need 3 columns- Loading all rows when you need top N
- Joining all tables when you can filter first
- Sorting in application when database can do it
How to Find These
1. Profile Everything
EXPLAIN (ANALYZE, BUFFERS) your_query;
Look for:
- High
actual timevscost - Seq scans on big tables
- Sorts with many rows
- Nested loops with high loop counts
2. Question Joins
Ask: “Do I need ALL rows from this join?”
- Often you can filter before joining
- Sometimes you can aggregate before joining
- Occasionally you can avoid the join entirely
3. Add Indexes (But Smartly)
-- Bad: Index every column
CREATE INDEX ON events(user_id);
CREATE INDEX ON events(created_at);
-- Good: Composite index for the query pattern
CREATE INDEX ON events(user_id, created_at);
4. Benchmark Edge Cases
Test with:
- Empty tables
- Maximum data
- Worst-case distributions
Our query was fast with 100 users. At 50k? Disaster.
The Code
Before (40 lines of SQL):
-- Original query
After (40 lines of SQL):
-- Optimized query with LATERAL join
Same line count. 400x faster. The best optimizations usually are just better algorithms.
Takeaway
Performance work is often not about making code faster—it’s about making code do less work.
Measure, question assumptions, and optimize the algorithm before optimizing the code.