Hacker News new | ask | show | jobs
by menaerus 1351 days ago
> and advocating for public, standardized benchmarks

For full transparency, I think you should do the same in ClickHouse. Or is there a strong reason not to run benchmarks on standard analytical workloads like TPC-H, TPC-DS or SSB?

2 comments

You can't post results of TPC benchmarks without official audit. So it complicates posting results. You can't find common names that are usually compared with ClickHouse there [1]. So open standardized ClickBench tries to encourage benchmarking for everyone.

There are numerous benchmarks that use similar to TPC queries, but those are not standardized and can be misleading. For example a lot of work was done by Fivetran to get this report [0], but they show only overall geomean for those systems and you can't understand how they actually differ. Anyway their queries are not original TPC - variables are fixed in queries, they run first query when official query is a multiquery.

Contributors from Altinity run SSB with flattened and original schemas [2]. SSB is not well standardized and we see a lot of pairwise comparisons with controversial results - generally you can't just reproduce them and get all the results in single place for the same hardware.

[0] https://www.fivetran.com/blog/warehouse-benchmark [1] https://www.tpc.org/tpcds/results/tpcds_results5.asp?orderby... [2] https://altinity.com/blog/clickhouse-nails-cost-efficiency-c...

There is a good reference to the available benchmarks for analytical databases: https://github.com/ClickHouse/ClickBench#similar-projects
On couple of occasions I've seen TPC-H benchmarks with the remark that the results are not audited. Is that not possible?
License states the following. All other modifications are not standardized and you can't just compare systems. Otherwise there would be another standardized benchmark in the list you propose to run and publish.

>c. Public Disclosure: You may not publicly disclose any performance results produced while using the Software except in the following circumstances: (1) as part of a TPC Benchmark Result. For purposes of this Agreement, a "TPC Benchmark Result" is a performance test submitted to the TPC, documented by a Full Disclosure Report and Executive Summary, claiming to meet the requirements of an official TPC Benchmark Standard. You agree that TPC Benchmark Results may only be published in accordance with the TPC Policies. viewable at http: //www.tpc.org (2) as part of an academic or research effort that does not imply or state a marketing position (3) any other use of the Software, provided that any performance results must be clearly identified as not being comparable to TPC Benchmark Results unless specifically authorized by TPC.

I see, thanks for the context, it seems like a PITA.

But given that each database system has its own flavor of SQL, vanilla TPC benchmarks may not work out of the box so one needs to tweak them a bit and this might be what actually disqualifies the published results from all of the clauses from above being applicable.

I can also anticipate that combination of clause (2) and (3) is what some that publish the results are also taking advantage of.

[1] https://www.oracle.com/mysql/heatwave/performance/ [2] https://www.singlestore.com/blog/tpc-benchmarking-results/ [3] https://docs.pingcap.com/tidb/v6.2/v5.4-performance-benchmar... [4] https://www.monetdb.org/blogs/learning-from-benchmarking/