Hacker News new | ask | show | jobs
by pron 46 days ago
All benchmarks tell the truth about themselves. That has never been what makes benchmarks good or bad. The worst and best benchmarks ever made are both truthful about their results.

But a good benchmark suite is one that covers a variety of different problems and/or programs similar to a significant portion of production software. The Benchmark Game is neither, plus it's confusing because it often compare things that measure the sophistication of the algorithm while making it seem it measures something about a language (you don't need to be deceitful to confuse). So no, I don't think it's a good benchmark suite at all.

1 comments

> The Benchmark Game is neither.

And makes no claim to be.

Here's something that could reasonably make those claims:

https://dl.acm.org/doi/10.1145/3669940.3707217

Oh! It's only Java.

> And makes no claim to be.

I know. I don't understand why you think I have a problem with the site's honesty. It's a poor benchmark suite, and it admits it is. We're in agreement.

> Here's something that could reasonably make those claims

I'm not familiar with this paper, but you seem to think I was complaining about false claims, which I wasn't. Benchmarks are problematic these days because results no longer generalise as they did a couple of decades ago, but some benchmarks are of higher quality than others (again, I'm not talking about what they say they are but about what they actually are) by at least covering a wider and possibly more relevant set of use cases, and by offering comparisons that are less confusing.

> I'm not familiar with this paper…

It presents "DaCapo Chopin, a major release of the DaCapo benchmark suite for Java". It's a benchmark suite. It says so.

> I'm not talking about what they say they are but about what they actually are

“When I use a word,” Humpty Dumpty said in rather a scornful tone, “it means just what I choose it to mean—neither more nor less.”

“The question is,” said Alice, “whether you can make words mean so many different things.”

“The question is,” said Humpty Dumpty, “which is to be master—that's all.”

I don't understand what you're trying to say. I said that the Benchmark Game is not a good benchmark suite in the sense that it does not measure language speed differences since 1. it compares different algorithms, and 2. it doesn't cover some of the most important use-cases that languages/runtimes optimise for [1]. That's all. I'm not saying it's deceitful, I'm saying it's just not good comparison of language speeds. Are you agreeing or disagreeing?

[1]: In particular, Java was designed to overcome some of the biggest performance issues of low-level languages that has plagued a large number of applications: memory management when objects are of varying sizes and lifetimes, concurrency (especially lock-free data structures), and dynamic dispatch, which grows in use as applications grow in size and complexity. Not a single one of these is covered in the Benchmark Game, which focuses on small, very regular, batch workloads, the very things that low-level languages have always been good at, and none of the areas where the performance of low-level languages has traditionally (and to this day) suffered and which led to different compiler and memory management designs.

> it's just not good comparison of language speeds

It's not that the benchmarks game is not a good benchmark suite, it isn't a benchmark suite.

It's not that the benchmarks game is not a good comparison of language speeds, it's that comparison of "language speeds" is so under-specified as-to-be wishful thinking.

> Java was designed to…

"… build software for the next generation of consumer electronics – think smart toasters, interactive TVs, and other futuristic gadgets." Things change.

>… the very things that low-level languages have always been good at…

Which is why there are people who find those kind-of Java programs being in-any-way comparable, somewhat surprising.

> It's not that the benchmarks game is not a good benchmark suite, it isn't a benchmark suite.

OK, but I was responding to someone who did consider it to be a benchmark suite. As long as we agree it's not a good benchmark suite whatever it considers itself to be, we're in agreement.

> It's not that the benchmarks game is not a good comparison of language speeds, it's that comparison of "language speeds" is so under-specified as-to-be wishful thinking.

With that I completely agree. But if you group results by language, that's exactly what you're inviting, and if your suite of benchmarks or whatever you want to call it covered a wider range of problems, that point could be more easily seen. Let's say that the combination of grouping results by language and covering only a very narrow (and niche) set of problems that also happens to be the sweet spot of some languages that have other significant performance failings in other use cases doesn't exactly help people get the right impression.