| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by anonacct37 944 days ago

Can't answer your question directly but I've touched a few of those here and there.

Clickhouse was deployed to replace a home grown distributed storage system that at one point in time a long time ago was much cheaper and faster than the results the team was getting with BQ.

We evaluated clickhouse and druid for a few data stores for doing interactive queries on a fairly high throughput clickstream data pipeline.

Clickhouse won in terms of performance. We did some chaos testing on it in the form of simulating node outages and network partitions and we were happy with the results. My only complaint is that there are some other database options which don't require me to run a database and that's nice if I can get away with it.

One of the things you might try is dumping your data into parquet files on gcs. There are quite a few databases you can query that with and get good results depending on your indexing and partitioning needs. It's tough to get lower operational burden than "stick it in cloud storage and sometimes spin up some stateless compute to query it".

I think duckdb is super cool but for me at the moment it's a solution that I'm still in search of a problem for.

1 comments

whalesalad 944 days ago

We’re also using duckdb in prod and the performance is phenomenal.

Haven’t tried using it to read parquet direct from cloud storage but that’s on our todo list. We currently use it to generate a massive table once a day and then throughout the day that’s queried to serve prod requests.

link