A SQL Heuristic: Ors Are Expensive

https://news.ycombinator.com/rss Hits: 17
Summary

A SQL Heuristic: ORs Are ExpensiveQuery planning is hard. Sometimes.Queries often have more than one filter (using an and clause).But the developer can often end up in situations requiring an or clause:select count(*) from application where submitter_id = :user_id or reviewer_id = :user_id; But this is slow! With 1,000,000 applications and 1000 users uniformly distributed on both columns, it takes over 100ms.1If we rewrite it with only andsselect ( select count(*) from application a where a.reviewer_id = :user_id ) + ( select count(*) from application a where a.submitter_id = :user_id ) - ( select count(*) from application a where a.submitter_id = :user_id and a.reviewer_id = :user_id ); This takes less than 1ms; Over 100 times faster! 12This is surprising — we have indexes on the filtered columns.But why? First, we need to develop some intuition about how and when indexes are used.Basic Query Planning IntuitionSay you have a table with two columns and an index on each column.create table example_table ( text_column varchar(2) not null, num_column int8 not null ); Doing a simple lookup via a column is similar in time complexity to doing a lookup with a BTreeMap<String, ExampleTable> or BTreeMap<i64, ExampleTable>.And if we use a compound index — (text_column, num_column), it's like doing a lookup with BTreeMap<String, BTreeMap<i64, ExampleTable>>. So filtering with an and (or filtering by the first, then sorting by the latter) is natural with a compound index.Now say we don't have a compound index, only individual indexes, then we have to be clever. We can use database statistics.If we want to check where text_column = 'ES' and num_column = 1, there's two options:lookup 'ES', then filter the results by the num_columnlookup 1, then filter by text_columnA couple problems to note here:If there's many 'ES's relative to the number of 1s, or vice-versa, and we pick the wrong plan, we're reading more data from disk than we need. Remember, indexes are stored in chunks or pa...

First seen: 2025-09-29 21:34

Last seen: 2025-09-30 13:37