Hacker News new | ask | show | jobs
by thegeomaster 1468 days ago
Sounds like they didn't re-do the QuestDB benchmark with same change to the indexes, and so their claim is that Clickhouse is 27x faster with a specific index than QuestDB without that index. Which is not a fair comparison.

Also, the tone of the post sounds really arrogant. They try to hide it a bit, I feel, but it just seeps through.

3 comments

I didn't really read it as arrogant, more as annoyed about a mischaracterization that was disparaging their product.
It's also part of a longer trend of saber rattling between these vendors - there's a history of these types of posts also from TimescaleDB: https://news.ycombinator.com/item?id=29096541
There is a small list of vendors that do not forbid to run benchmarks with their systems. https://cube.dev/blog/dewitt-clause-or-can-you-benchmark-a-d...

That is why there is a small subset of vendors that are being 'attacked' by this comparisons.

More and more we start to see why these forbids are in place.
Well, I don't know how QuestDB works, and I couldn't find anything in the original benchmark, but probably they already have some sort of (geo)index in place? It's really strange to search geo-data by scanning the whole surface of the Earth. The point that Clickhouse outperforms this by just sorting on one axis (and even not using any fancy 2D indices) is reasonable.
No, there are no indexes in QuestDB in the article. None. Zero. That's bold mistake in the ClickHouse article. Should be named Yes, QuestDb is Faster.
Why does the lack of indexes matter? Especially when the size on disk is so much higher? Defining a sensible index isn't an unreasonable or daunting task, and minimal effort in CH got a 4x speedup over QuestDB. "It's faster if you invest literally zero time making it efficient" doesn't offer any practical benefit to anyone.

If it was demonstrated that Quest did a better job overall in the majority of cases where an optimization would have been missed, that's one thing. But this feels awfully nitpicky.

The article is not _just adding an index_. They are embedding one of the search fields in a table _primary key_. That likely means the whole physical table layout is tailored for that single specific query.

While it can help to win this very benchmark it's questionable whether it's usable in practice. Chances are an analytical database serves queries of various shapes. If you only need to run a single query over and over again then you might be better off with a stream processing engine anyway.

The primary key is, in effect, an index. Specializing on the latitude field of a table of geographic data seems like an incredibly small thing to nitpick.
Yeah, I've read more carefully and it seems they're doing full scan.
I was curious to hear more details about this statement - "while QuestDB utilizes its full indexing strategy to read just a tiny fraction of the actual data". Did QuestDB create indexes in their QuestDB benchmark but just not mention it? Are there geoindexes which are automatically enabled which do help (but are of less value in the general sense from Clickhouse' perspective)?
I don't know how QuestDB is implemented in any detail, but this statement struck me as confused. My understanding is that for this query, QuestDB is performing a full scan of the relevant columns, and the point of the blog post was how fast their JIT engine for filtering makes this.
There were 2 queries in the QuestDB benchmark over the same table. ClickHouse didn't even try to match both of them choosing one as a victim. I guess that's what happens when you optimise the data storage for one query.