Hacker News new | ask | show | jobs
by jandrewrogers 3287 days ago
Some of the basic assertions, such as the relative inefficiency of block compression in database engines, are true. I've seen material gains from using context/content-aware compression and some commercial OLAP databases exploit this extensively. They appear to be using many of the same kinds of techniques.

However, the assertions made around caching behavior, such as wasting memory due to double caching, are not generally true. While you will see this in simple/naive database engines, a sophisticated high-performance database implementation won't be designed this way.

1 comments

Thanks. These assertions are here to give a basic background and overview of databases performance in general. The real game changer with Terark is our novel compression algorithm. It's more space efficient, that's one thing, but above all else we can search directly into the compressed data without decompressing it. That's the real breakthrough.

We do that by using a data structure called Succinct Nested Trie, and we've introduced concepts such as CO-Index (Compressed Ordered Index) and PA-Zip (Point Accessible Zip).

We were at first a compression company, and turned to storage engines and database as a domain of application for our algos, hence the analogy with Pied Piper :)

How does your technique compare to a typical column store?