Why BM25 Queries with More Terms Can Be Faster

I discovered something weird about BM25 full-text search: queries with MORE terms can sometimes be FASTER than queries with fewer terms. This broke my mental model of how search works.

What Is BM25?

BM25 (Best Matching 25) is the algorithm behind most full-text search engines: Elasticsearch, PostgreSQL’s full-text search, SQLite FTS, etc.

It ranks documents by how well they match a query, considering:

Term frequency: How often does the word appear?
Inverse document frequency: How rare is the word across all documents?
Document length: Normalize for long vs short documents

score = Σ IDF(q_i) · (f(q_i, D) · (k_1 + 1)) / (f(q_i, D) + k_1 · (1 - b + b · |D| / avgdl))

Where:
- q_i: Query term
- D: Document
- f(q_i, D): Term frequency in document
- |D|: Document length
- avgdl: Average document length
- k_1, b: Tuning parameters

The Counterintuitive Behavior

Test setup: 1M documents, PostgreSQL full-text search.

-- One-term query (common word)
SELECT * FROM documents
WHERE tsv @@ to_tsquery('database')
ORDER BY ts_rank(tsv, to_tsquery('database')) DESC
LIMIT 10;
-- Time: 145ms

-- Three-term query
SELECT * FROM documents
WHERE tsv @@ to_tsquery('database & optimization & indexing')
ORDER BY ts_rank(tsv, to_tsquery('database & optimization & indexing')) DESC
LIMIT 10;
-- Time: 38ms

Three terms is 4x faster than one term. Why?

The Explanation

1. Posting List Intersection

Full-text search uses inverted indexes. Each word maps to a list of documents containing it:

"database"     -> [doc1, doc2, doc3, doc4, ..., doc50000]  (50k docs)
"optimization" -> [doc1, doc5, doc9, doc12, ..., doc8000]  (8k docs)
"indexing"     -> [doc2, doc5, doc8, doc9, ..., doc5000]   (5k docs)

For a multi-term query with AND:

Scan the SHORTEST posting list first
Check if those docs are in other lists
Only score documents in the intersection

"indexing" (5k docs)
  ∩ "optimization" (8k docs)
  ∩ "database" (50k docs)
= ~500 docs to score

2. Early Termination

When you need only top-N results, you can stop early:

1. Score first batch of candidate docs
2. Keep top 10
3. When next doc's max possible score < 10th place score: STOP

With more specific queries (more terms), you reach this threshold faster.

3. Score Bounds

BM25 scores have upper bounds based on IDF values. If you’re looking for top 10 and you’ve already found 10 docs with score > 5.0, you can skip checking docs where max possible score is < 5.0.

More query terms = higher threshold = more docs can be skipped.

When This Breaks Down

This speedup requires:

Terms with different frequencies: One rare term to anchor on
Top-N queries: Not scanning all results
Good index: Posting lists sorted by doc ID or score

It breaks when:

All terms are equally common (no good anchor)
You need ALL results (can’t early-terminate)
Terms are very rare (tiny posting lists, overhead dominates)

Practical Applications

1. Query Expansion

Add related terms to make queries MORE specific:

-- Slow
SELECT * FROM docs WHERE tsv @@ 'python';

-- Fast
SELECT * FROM docs WHERE tsv @@ 'python & programming & tutorial';

2. Negative Terms

Exclude common terms to narrow search:

SELECT * FROM docs
WHERE tsv @@ 'python & programming & !introduction'
LIMIT 10;

The !introduction excludes 30% of results, making intersection smaller.

3. Phrase Queries

Phrases are even better:

-- "web server" as phrase
SELECT * FROM docs WHERE tsv @@ '"web server"';

Phrase matching requires position checks, but operates on a tiny candidate set.

Benchmarks

Real query from our logs (1M document corpus):

Query: "database"
Matches: 89,432 docs
Time: 167ms

Query: "database optimization"
Matches: 3,421 docs
Time: 54ms

Query: "database optimization postgresql"
Matches: 412 docs
Time: 18ms

Query: "database optimization postgresql vacuum"
Matches: 47 docs
Time: 9ms

Each added term:

Reduces candidate set by ~10x
Improves query time by ~3x

Implementation Notes

This works because modern search engines:

Sorted posting lists: Can skip ahead efficiently
Score bounds: Calculate max possible scores without full evaluation
Early termination: Stop when top-N found
Heap-based ranking: Keep only top-N in memory

From Lucene source (simplified):

while (scorerQueue.size() > 0) {
  Scorer scorer = scorerQueue.top();
  int doc = scorer.docID();
  float score = scorer.score();

  if (score < minCompetitiveScore) {
    // Skip this document
    continue;
  }

  if (pq.size() < numHits || score > pq.top().score) {
    pq.insertWithOverflow(new ScoreDoc(doc, score));
    // Update minimum score threshold
    minCompetitiveScore = pq.top().score;
  }
}

Lessons

More constraints = faster: Counterintuitively true for search
Selectivity matters: One rare term beats many common terms
Index structure enables optimization: Sorted posting lists are key
Top-N is special: Different complexity than “scan all”

Try It Yourself

-- PostgreSQL example
CREATE TABLE docs (id INT, content TEXT);
CREATE INDEX idx_fts ON docs USING gin(to_tsvector('english', content));

-- Benchmark
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM docs
WHERE to_tsvector('english', content) @@ to_tsquery('term1 & term2 & term3')
LIMIT 10;

Watch how adding terms changes the execution plan.

Search is full of counterintuitive optimizations. This is just one of them.