Hacker News new | ask | show | jobs
by andrejserafim 1701 days ago
Our anecdata: we store telemetry per thing. After loading a month worth of data - timescaldb as hosted by their cloud ran a difference aggregation in seconds. Clickhouse routinely did it in 20 millis.

Simple avg, etc were better, but always clickhouse was an order of magnitude faster than timescale. We didn't invest a whole bunch into optimization other than trying some indexing strategies in timescaledb.

So for our use case the choice is clear.

4 comments

(N.B. post author)

Thanks for the feedback. Without knowing your situation, one of the things we show in the blog post is that TimescaleDB compression often changes the game on those kinds of queries (data is transformed to columnar storage when you compress). You don't mention if you did that or not, but it's something we've seen/noticed in every other benchmark at this point - that folks don't enable it for the benchmark.

And second point of the article is that you have lots of options for whatever works in your specific situation. But, make sure you're using the chosen database features before counting it out. :-)

I wonder if it's worth taking a page out of the MongoDB book and enabling these kinds of benchmark altering settings by default. We certainly selected clickhouse over tailscale internally because of major performance differences in our internal testing that might have gone the other way had we "known better".
Indeed. Lots of discussion over this in the last few months. There are nuances, but I think you'll see some progress in this area over the next year.
From my experience of benchmarking these databases on scientific data (highly regular timeseries) and looking at the internals of both, these kinds types of number happen when answering the query needs crunching through many rows, but the output has few. i.e. the queries are filtering and/or aggregating a ton of input rows, that can't be excluded by indexes or queried from preaggregations.

From what I can tell it comes down to execution engine differences. TimeScale, even with compressed tables, uses a row by row execution engine architecturally resembling IE6 era JS engines. ClickHouse uses a batched and vectorized execution engine utilizing SIMD. Difference is one to two orders of magnitude of throughput in terms raw number of rows per core pushed through the execution engine.

Postgres/TimeScale could certainly also implement a similar model of execution, but to call it an undertaking would be an understatement considering the breadth and extensibility of features that the execution engine would need to support. To my knowledge no one is seriously working on this outside of limited capability hacks like vops or PG-Strom extensions.

(post author)

You do a great job summarizing some of the benefits of ClickHouse we mentioned in the post, including the vectorized engine!

That said, I'm not sure I'd refer to PostgreSQL/TimescaleDB engine architecture as resembling IE6 JS support. Obviously YMMV, but every release of PG and TimescaleDB bring new advancements to query optimizations for the architecture they are designed for, which was the focus of the post.

I'm personally still impressed, after 20+ years of working with SQL, relational databases, when any optimization engine can use statistics to find the "best" plan among (potentially) thousands in a few ms. Maybe I'm too easily impressed. :-D

The optimization engine is of course great (despite occasionally missing hard), but I am not referring to it. I am referring to the way that PostgreSQL executes query plans, the way rows are pulled up the execution tree, is very similar to first iterations JavaScript engines - a tree based interpreter. Picking out columns from rows and evaluating expressions used to work the same until PG11, where we got a bytecode based interpreter and a JIT for those. But so far rows are still working the same way, and it hurts pretty bad when row lookup is cheap and the rows end up either thrown away or aggregated together with basic math.
With TimescaleDB compression, 1000 rows of uncompressed data are compressed into column segments, moved to external TOAST pages, and then pointers to these column segments are stored in the table's "row" (along with other statistics, including some common aggregates).

So while the query processor might still be "row-by-row", each "row" it processes actually corresponds to a column segment for which parallelization/vectorization is possible. And because these column segments are TOASTed, the row itself are just pointers, and you only need to read in those compressed column segments that you are actually SELECTing.

Anyway, might have known this, just wanted to clarify. Thanks for discussion!

yeah very interesting, i was wondering how timescale pushed postgres more towards columnar without rewriting a bunch of postgres itself.

My understanding of TOAST is that it itself is just a bunch of rows in a toast table that split the compressed "row" or in this case "1000 rows of 1 column" across as many rows as required to store the data whilst remaining within the postgres page size limits (normally 8kb).

With the often quoted postgres per row overhead of 23 bytes~ which you would have to pay for each TOAST row as well, does this not add up and eat into your storage efficiencies? or does compression work so well that the 23 bytes x N rows (1 row pointing to toast + N toast rows) required to store the "row" isn't important?

The compressed column segment is stored in a single row in TOAST.

More info: https://blog.timescale.com/blog/building-columnar-compressio...

Was this for your primary source-of-truth, or more of a downstream data warehouse, or something else?

I'm struggling to imagine a case where these are the two things being considered; Timescale is the obvious choice for a primary database, Clickhouse the obvious choice for a warehouse. I wouldn't let my user-facing app write to Clickhouse, and while I could potentially get away with a read-only Timescale replica for internal-facing reports I would expect to eventually outgrow that and reach for Clickhouse/Snowflake/Redshift.

> I wouldn't let my user-facing app write to Clickhouse

I’ve been thinking of doing exactly that. What are your concerns?

https://blog.cloudflare.com/http-analytics-for-6m-requests-p...

has some good thoughts. The main thing you'll likely need is some sort of a buffer layer so you can do bulk inserts. Do not write a high-volume of single-row inserts into Clickhouse.

Chproxy is designed to handle this

https://github.com/Vertamedia/chproxy

Thanks for sharing the link! I’ve heard the bulk insert thing before and to be honest I’ve always thought that RDBMSs don’t love single row inserts either. Seems clickhouse takes that to a new level.

In our case we are using sqs and usually insert 20-100 rows into the db at a time so I’m going to benchmark how that does in clickhouse.

With Clickhouse you can use a "buffer table", which uses just RAM and sits on top of a normal table: https://clickhouse.com/docs/en/engines/table-engines/special...

Rows inserted into the buffer table are then flushed to the normal/base table when one of the limits (defined when the buffer table is created) is reached (limits are max rows, max bytes, max time since the last flush), or when you drop the buffer table.

I'm using it and it works (performance difference can be huge compared to perform single inserts directly into a real/normal table), but be careful - the flushed rows don't give a guarantee of which row is flushed in which sequence, so using a buffer table is a very bad idea if your base table is something which relies on the correct sequences of rows that it receives.

On a project I worked on we found the sweet spot to be 20k-60k rows per insert.
I suppose it depends what you're going to let your user do, but OLAPs in general and Clickhouse in particular don't do well under row-oriented workloads, as described in the post here. I'm imagining users primarily operating on small numbers of rows and sometimes making updates to or deleting them, a worst-case scenario for Clickhouse but best-case for an OLTP like Postgres.
Ah totally. Thanks for sharing your thoughts! In my case I’m evaluating clickhouse as a source of truth for customer telemetry data. Totally agree about the OLTP limitations.
(Remember that clickhouse is not reliable. It doesn’t pretend to be.

Clickhouse is great for lots of common query workloads, but if losing your data would be a big deal then it makes a lot of sense to have your data in a reliable and backed up place (eg timescale or just s3 files or whatever) too.

Of course lots of times people chuck stuff into clickhouse and it’s fine if they lose a bit sometimes. YMMV.)

I have not found this to be the case. Like any system you need to take precautions (replicas and sharding) to ensure no data loss, but I didn't find that to be challenging. In what way have you found ClickHouse particularly risky in this way?
> Clickhouse the obvious choice for a warehouse > Clickhouse/Snowflake/Redshift.

but clickhouse is very unlike the other two. when i think of a warehouse i think star schema, data modeling ect not something that hates joins.

Agreed, I wouldn't use Clickhouse for usual warehouse stuff either, mostly because I can't imagine it plays well with dbt which is a non-starter these days.

I'd still argue Clickhouse is closer to Snowflake/Redshift than anything OLTP, and their name is intentionally chosen to evoke warehouse-like scenarios.

What makes you think CH doesn’t like joins?

Having used Redshift, Snowflake and CH for similar workloads, I’d much prefer ClickHouse to the other 2.

Snowflake is hideously expensive for the subpar perf it offers in my experience and Redshift is mediocre at best in general.

Clickhouse is nothing like snowflake. A Postgres user can be useful in Clickhouse in minutes, not even close for snowflake.
Is your comment on ClickHouse and DBT based on using the DBT ClickHouse plugin? [0] If so I would be very interested in understanding what you or others see as deficiencies.

[0] https://github.com/silentsokolov/dbt-clickhouse

How many data points were those aggregations being computed over? How much memory does your Postgres server have, and are you using SSD storage (with associated postgres config tweaks)?
(Post author)

Howdy! We provided all of those details in the post and you're welcome to join us next week when we live-stream our setup and test!

https://blog.timescale.com/blog/what-is-clickhouse-how-does-...

I was responding to @andrejserafim, asking about their scenario, not the article.
Gotcha! My apologies for not seeing the thread nature. HN threads get me sometimes. :-)