Hacker News new | ask | show | jobs
by heipei 331 days ago
Counterpoint: I once wrote a paper on accelerating blockciphers (AES et al) using CUDA and while doing so I realised that most (if not all) previous academic work which had claimed incredible speedups had done so by benchmarking exclusively on zero-bytes. Since these blockciphers are implemenented using lookup tables this meant perfect cache hits on every block to be encrypted. Benchmarking on random data painted a very different, and in my opinion more realistic picture.
5 comments

Why not use real world data instead? Grab a large pdf or archive and use that as the benchmark.
A common case is to compress data before encrypting it so random is realistic. Pdf might allow some optimization (how I don't know) that is not representative
Or at least the canonical test vectors. All zeros is a choice.
There are an enormous number of papers like this. I wrote a paper accelerating a small CNN classifier on FPGA and when I compared against the previous SOTA GPU implementation and the numbers were way off from the paper. I did a git blame on their repo and found that after the paper was published they deleted the lines that short-circuited eval if the sample was all 0 (which much of their synthetic data was ¯\_(ツ)_/¯).
I'm sure I'm getting this wrong, but I think I remember someone pointing out that Borland's Turbo Pascal "compiled lines per second" figure no one could even come close to replicating, until someone wrote a legitimate Pascal program that had essentially that number of lines containing only ";", or something along those lines.

It WAS still a great compiler, and way faster than the competition at the time.

Back when Genetic Algorithms were a hot topic I discovered that a large number of papers discussion optimal parameterisation of the the approach (mutation rate, cross-over, populations etc) were written using '1-Max' as the "problem" to be solved by the GA. (1-Max being attempting to make every bit of the candidate string a 1 rather than 0)

This literally made the GA encoding exactly the same as the solution and also very, very obviously favoured techniques that would MAKE ALL THE BITS 1!

This reminds me of this paper:

https://link.springer.com/article/10.1007/s10710-021-09425-5

They do the convergence analysis on the linear system Ax = 0, which means any iteration matrix (including a zero matrix) that produces shrinking numbers will converge to the obvious solution x = 0 and the genetic program just happens to produce iteration matrices with lots of zeros.

Do you have a link to that paper?