Hacker News new | ask | show | jobs
by felipe_aramburu 2673 days ago
Ok. Right now we are in tunnel vision mode to get our distributed version out by GTC in mid march. We will benchmark against clickhouse sometime in March. Do you know of any benchmark tests that are a bit more involved in terms of query complexity? We are most interested in queries where you can't be clever and use things like indexing and precomputed materializations.

The more complex the query the less you can rely on being clever and the more the guts need to be performant and that is more important to us right now.

2 comments

I work for Altinity, which offers commercial support for ClickHouse. We like benchmarks. :)

We use the DTC airline on time performance dataset (https://www.transtats.bts.gov/tables.asp?DB_ID=120) and Yellow Taxi trip data from NYC Open Data (https://data.cityofnewyork.us/browse?q=yellow%20taxi%20data&...) for benchmarking real-time query performance on ClickHouse. I'm working on publishing both datasets in a form that makes it easy to load them quickly. Queries are an exercise for the reader but see Mark Litwintschik's blog for good examples of queries: https://tech.marksblogg.com/billion-nyc-taxi-clickhouse.html.

We've also done head-to-head comparisons on time series using the TSBS benchmark developed by the Timescale team. See https://www.altinity.com/blog/clickhouse-timeseries-scalabil... for a description of our tests as well as a link to the TSBS Github project.

On an unrelated note: Oh, if you guys are using the OnTime data, have a look at this: https://github.com/eyalroz/usdt-ontime-tools
BTW, I think you do need to consider materialized views. ClickHouse materialized views function like projections in Vertica. They can apply different indexing and sorting to data. Unless your query patterns are very rigid it's hard to get high performance in any DBMS without some ability to implement different clustering patterns in storage.