|
|
|
|
|
by puzpuzpuz-hn
919 days ago
|
|
> Unless I'm missing something, hashing function is fast compared to random bouncing around inside ram – very much faster then random memory accesses. So I can't see how it make a difference. In a GROUP BY, you may have a few hundred million rows, but only a few hundred groups within them. A slow function would slow down things dramatically in that case since the hash table remain small and data access is potentially linear. > Then you've got sorted data, in which case use a merge join instead of a hash join surely. This property is beneficial for GROUP BY which includes a timestamp or a function over timestamp. QuestDB organizes data sorted by time, so relying on insertion order may help to avoid redundant sorting if there is an ORDER BY clause with the timestamp column. As for merge join, we also use it in ASOF join: https://questdb.io/docs/reference/sql/join/#asof-join |
|
ISWYM although that is rather a specific case. For your purposes though it may be a common case, I don't know.
> QuestDB organizes data sorted by time, so relying on insertion order may help to avoid redundant sorting if there is an ORDER BY clause with the timestamp column.
If data is already sorted and you have an 'order by' then just use the data directly – bingo, instant merge join, no hash table needed.