|
|
|
|
|
by ryanworl
2709 days ago
|
|
For analytics workloads, your best bet is using compression techniques that let you do operations on the data without decompressing it. A good example is dictionary encoding a set of sorted string keys so you can preform prefix queries by doing a greater than and less than comparison on the integers instead of examining every string entirely. Once you’ve encoded the data into large enough blocks, you could use any storage engine and write the encoded blocks into it along with metadata for managing which blocks are a part of what tables and partitions of tables. You can also just use something like Parquet or ORC, but that’s not going to get you the best performance possible. |
|
The best explanation for all the various techniques the go into the data structures and operator designs for OLAP workloads is the survey 'The Design and Implementation of Modern Column-Oriented Database Systems' by Abadi, Boncz, Harizopoulos, Idreos, and Madden: http://db.csail.mit.edu/pubs/abadi-column-stores.pdf