I really hope someone with vast knowledge of database internals will come here and comment on Terark claims. The blog entry mentioned in the comments is a better source of information than that article.
Thank you, we really appreciate your comment. As I mentioned in an other reply, we understand the claim may sound outlandish, so we try to be as transparent as possible:
- We provide several different benchmark results and detailed procedures: https://github.com/Terark/terarkdb/wiki/Benchmark
- We provide a free license of TerarkDB and you can download the exact scripts we used and run your own benchmarks with the configuration you want.
We're a bunch of geeks (I think the picture is worth a thousand words ^^) who had a scratch to itch and a lightbulb moment. We built a product around it and we're trying to make a sustainable business. Any feedback or comment is welcome.
We understand some people might be skeptical and we're happy to answer any question. And if you like it, we would be thrilled if you could help spread the word. We're not the bests at marketing... haha
Some of the basic assertions, such as the relative inefficiency of block compression in database engines, are true. I've seen material gains from using context/content-aware compression and some commercial OLAP databases exploit this extensively. They appear to be using many of the same kinds of techniques.
However, the assertions made around caching behavior, such as wasting memory due to double caching, are not generally true. While you will see this in simple/naive database engines, a sophisticated high-performance database implementation won't be designed this way.
Thanks. These assertions are here to give a basic background and overview of databases performance in general. The real game changer with Terark is our novel compression algorithm. It's more space efficient, that's one thing, but above all else we can search directly into the compressed data without decompressing it. That's the real breakthrough.
We do that by using a data structure called Succinct Nested Trie, and we've introduced concepts such as CO-Index (Compressed Ordered Index) and PA-Zip (Point Accessible Zip).
We were at first a compression company, and turned to storage engines and database as a domain of application for our algos, hence the analogy with Pied Piper :)
- We provide several different benchmark results and detailed procedures: https://github.com/Terark/terarkdb/wiki/Benchmark - We provide a free license of TerarkDB and you can download the exact scripts we used and run your own benchmarks with the configuration you want.
We're a bunch of geeks (I think the picture is worth a thousand words ^^) who had a scratch to itch and a lightbulb moment. We built a product around it and we're trying to make a sustainable business. Any feedback or comment is welcome. We understand some people might be skeptical and we're happy to answer any question. And if you like it, we would be thrilled if you could help spread the word. We're not the bests at marketing... haha