| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ants_a 1701 days ago

From my experience of benchmarking these databases on scientific data (highly regular timeseries) and looking at the internals of both, these kinds types of number happen when answering the query needs crunching through many rows, but the output has few. i.e. the queries are filtering and/or aggregating a ton of input rows, that can't be excluded by indexes or queried from preaggregations.

From what I can tell it comes down to execution engine differences. TimeScale, even with compressed tables, uses a row by row execution engine architecturally resembling IE6 era JS engines. ClickHouse uses a batched and vectorized execution engine utilizing SIMD. Difference is one to two orders of magnitude of throughput in terms raw number of rows per core pushed through the execution engine.

Postgres/TimeScale could certainly also implement a similar model of execution, but to call it an undertaking would be an understatement considering the breadth and extensibility of features that the execution engine would need to support. To my knowledge no one is seriously working on this outside of limited capability hacks like vops or PG-Strom extensions.

1 comments

ryanbooz 1701 days ago

(post author)

You do a great job summarizing some of the benefits of ClickHouse we mentioned in the post, including the vectorized engine!

That said, I'm not sure I'd refer to PostgreSQL/TimescaleDB engine architecture as resembling IE6 JS support. Obviously YMMV, but every release of PG and TimescaleDB bring new advancements to query optimizations for the architecture they are designed for, which was the focus of the post.

I'm personally still impressed, after 20+ years of working with SQL, relational databases, when any optimization engine can use statistics to find the "best" plan among (potentially) thousands in a few ms. Maybe I'm too easily impressed. :-D

link

ants_a 1701 days ago

The optimization engine is of course great (despite occasionally missing hard), but I am not referring to it. I am referring to the way that PostgreSQL executes query plans, the way rows are pulled up the execution tree, is very similar to first iterations JavaScript engines - a tree based interpreter. Picking out columns from rows and evaluating expressions used to work the same until PG11, where we got a bytecode based interpreter and a JIT for those. But so far rows are still working the same way, and it hurts pretty bad when row lookup is cheap and the rows end up either thrown away or aggregated together with basic math.

link

mfreed 1701 days ago

With TimescaleDB compression, 1000 rows of uncompressed data are compressed into column segments, moved to external TOAST pages, and then pointers to these column segments are stored in the table's "row" (along with other statistics, including some common aggregates).

So while the query processor might still be "row-by-row", each "row" it processes actually corresponds to a column segment for which parallelization/vectorization is possible. And because these column segments are TOASTed, the row itself are just pointers, and you only need to read in those compressed column segments that you are actually SELECTing.

Anyway, might have known this, just wanted to clarify. Thanks for discussion!

link

nhoughto 1700 days ago

yeah very interesting, i was wondering how timescale pushed postgres more towards columnar without rewriting a bunch of postgres itself.

My understanding of TOAST is that it itself is just a bunch of rows in a toast table that split the compressed "row" or in this case "1000 rows of 1 column" across as many rows as required to store the data whilst remaining within the postgres page size limits (normally 8kb).

With the often quoted postgres per row overhead of 23 bytes~ which you would have to pay for each TOAST row as well, does this not add up and eat into your storage efficiencies? or does compression work so well that the 23 bytes x N rows (1 row pointing to toast + N toast rows) required to store the "row" isn't important?

link

mfreed 1700 days ago

The compressed column segment is stored in a single row in TOAST.

More info: https://blog.timescale.com/blog/building-columnar-compressio...

link

nhoughto 1700 days ago

Does timescale do it’s own compression alg too? I see in pg 14 toast column compression can be lz4 instead of ootb pglz which has a few probs appr, I see mentions on the mailing list of significant possible optimizations. When dealing with EBS style storage where read latencies can be multi millis compression is always going to be a win, but is an easy optimization either way I’d think.

link

nhoughto 1700 days ago

Ah so only costs 1 row for pointer and 1 row for toast? Well that’s much more deterministic

link