Hacker News new | ask | show | jobs
by mjarrett 576 days ago
What kinds of SQL queries could ClickHouse not handle? Were the limitations about expressivity of queries, performance, or something else? I'm considering using CH for storing observability (particularly tracing) data, so I'm curious about any footguns or other reasons it wouldn't be a good fit.
1 comments

I'm editing the transcript right now, and he says it's more about exposing a nice API to the user.

E.G: Clickhouse interval support, which is an important type for observability, was lacking. You couldn't subtract datetimes to get an interval. If you'd compared 2 milliseconds intervals to one second ones, it wouldn't look at the unit and would say 2 ms is bigger, etc. So he had to go to the dev team, and after enough back and forth, instead of fixing it, they decided to return an error and he had to insist for a long time until they actually implemented a proper solution.

Quoting him "But like these endless issues with ClickHouse's flavor of SQL were problematic."

Another problem seemed to be that to benefit from very big scaling with things like data in Parquet at rest + local cache meant basically leaking all your money to AWS because the self-hosted version didn't expose a way to do that yourself. Click house scales fine at my size, so I can only trust him on that front since I'm nowhere that big.

Funnily after that, they moved to TimeScale, and the perfs wouldn't work for their use case.

They landed on DataFusion after a lot of trials and errors.

But really interesting perspective on the whole thing, you can see he is kinda obsessed with the user experience. The guy wrote a popular marshmallow alternative, 2 popular celery alternative and one watchdog popular alternative, all FOSS.

These kind of people are the source of all imposter syndrome in the world.

I'll publish that video next week on Bite Code if I can. If I can't, it will have to wait 3 weeks cause I'm leaving for a bit. But Charlie Marsh's one (uv's author) is up, if you are into overachievers.

One of the devs working on Logfire here. Part of it was the level of support. Like Samuel said the ClickHouse folks were not receptive to bug reports. The Timescale team is leagues ahead in that sense, they’re super responsive and helpful. Ultimately one of the reasons for choosing DataFusion was that it’s much more approachable of a project and indeed we’ve already gotten tremendous bidirectional benefit: the DataFusion team has helped us figure out some complex bits and we’ve done significant upstream contributions. By the way, DataFusion is now the fastest single node query engine on ClickBench: https://datafusion.apache.org/blog/2024/11/18/datafusion-fas...

Another reason we use DataFusion is multi-tenancy: we found it was hard to use RLS and such to implement multi-tenancy. We’ve had much better luck with the extensibility of DataFusion.