No idea why people reacting here so far got fixated on the "cheating" versions - it's clear to me they were included mainly to set a maximal speed baseline/benchmark and are not the main point of the article.
I find the "cheating" versions peculiar because I don't see the purpose of it. What's the point, and in what way is it cheating? It's just a different algorithm, and doesn't add any useful information to the subject at hand.
Numerical operations in a loop are often subject to aggressive optimisation by C compilers, which makes them tricky to use in benchmarks: are we measuring the intended loop, or has the work been optimised away? Often comparisons are made of "Blub vs C", where the C result is an order of magnitude smaller, and it's not clear if that's because C is fast or whether it's been optimised away.
Including an "optimised away" version lets us know when this has happened: the "non-cheating" benchmarks take much longer than the "cheating" ones, so we can assume they've not been optimised away.
I assume the author only went into detail about them because they're independently interesting, regardless of the main topic of the post.
The solution to this is to deliver arguments at runtime, rather than baking them into the program as constants. Describe some computation by a data structure that is delivered at runtime, and see which implementation does best. That way there can be no cheating.
> The solution to this is to deliver arguments at runtime, rather than baking them into the program as constants.
Functions already take their arguments at runtime. Except when they don't, due to optimisation.
For a benchmark to be automated, reproducible, etc. those constants have to be baked in somewhere, even if it's in a Haskell program using FFI (as in the article), or a shell script, etc. Whilst optimisers don't (yet) cross the language/process boundary, it still makes sense to include such sanity checks, rather than assuming we know what the optimiser will/won't do.
After all, the whole point of a benchmark is to gather evidence to question/ground the assumptions of our mental model. The less we assume, the better. The more evidence we gather, the better.