Hacker News new | ask | show | jobs
by igouy 47 days ago
Seems like the benchmarks game didn't say that anything interesting about long running programs was measured? And didn't say that "interesting" memory management was measured. And didn't say

I suppose when you write "because it compares different algorithms" you didn't say that there were no comparisons based on the same algorithm.

We've certainly not attempted to prove that these measurements, of a few tiny programs, are somehow representative of the performance of any real-world applications — not known — and in-any-case Benchmarks are a crock.

1 comments

The problem with benchmarks isn't that they themselves are lying. Benchmarks always tell the truth - about themselves. The problem is in the conclusions people draw from them. In the nineties benchmarks were still a little extrapolatable because we could say X is slow and Y is fast, as many operations had an intrinsic cost. These days, almost no benchmark (certainly microbenchmark) is extrapolatable to anything beside itself. Is a branch slow or fast? That depends on what the program did before and what it intends to do later. Is memory access slow or fast? Ditto. Function call? Allocation? They're all so context-dependent now that the only use of benchmarks of some mechanism is for the authors of the mechanism who know exactly how it works, what exactly is being measured, and what can be extrapolated from that.

If I write a malloc benchmark I may think, oh, this measures the cost of malloc/free. In reality, it only measures the cost for a program whose concurrency, allocation/deallocation patterns, and duration match exactly what I wrote, and bear little resemblance to the numbers I'd get if any of those were different.

So I'm not saying that the Benchmark Game is lying. It is telling the truth about how long those programs ran. It's just that what we can generalise from those benchmarks is even less than what we can from more "interesting" ones, but given that even that is close to nothing anyway, maybe it doesn't matter.

It is telling the truth about how long those programs ran, period.

There seem to be people who find those brute facts surprising in themselves.

All benchmarks tell the truth about themselves. That has never been what makes benchmarks good or bad. The worst and best benchmarks ever made are both truthful about their results.

But a good benchmark suite is one that covers a variety of different problems and/or programs similar to a significant portion of production software. The Benchmark Game is neither, plus it's confusing because it often compare things that measure the sophistication of the algorithm while making it seem it measures something about a language (you don't need to be deceitful to confuse). So no, I don't think it's a good benchmark suite at all.

> The Benchmark Game is neither.

And makes no claim to be.

Here's something that could reasonably make those claims:

https://dl.acm.org/doi/10.1145/3669940.3707217

Oh! It's only Java.

> And makes no claim to be.

I know. I don't understand why you think I have a problem with the site's honesty. It's a poor benchmark suite, and it admits it is. We're in agreement.

> Here's something that could reasonably make those claims

I'm not familiar with this paper, but you seem to think I was complaining about false claims, which I wasn't. Benchmarks are problematic these days because results no longer generalise as they did a couple of decades ago, but some benchmarks are of higher quality than others (again, I'm not talking about what they say they are but about what they actually are) by at least covering a wider and possibly more relevant set of use cases, and by offering comparisons that are less confusing.

> I'm not familiar with this paper…

It presents "DaCapo Chopin, a major release of the DaCapo benchmark suite for Java". It's a benchmark suite. It says so.

> I'm not talking about what they say they are but about what they actually are

“When I use a word,” Humpty Dumpty said in rather a scornful tone, “it means just what I choose it to mean—neither more nor less.”

“The question is,” said Alice, “whether you can make words mean so many different things.”

“The question is,” said Humpty Dumpty, “which is to be master—that's all.”