|
|
|
|
|
by kbolino
795 days ago
|
|
The builtin benchmarking tool in Go runs with parallelism set to 2x the number of cores. That should make evident any code that scales poorly. It is a tunable parameter though; I can try 4x cores etc. I also ran it on x86 (AMD Ryzen 5600X) and got similar results; everything ran faster on a 4.6 GHz chip, but ChaCha8 stayed around 2 cpb while PCG improved a little to about 0.7 cpb. I do have a ~10-year-old Celeron NUC laying around that I could use to test older/lower-end hardware. I can also publish the (admittedly very simple) program I'm using. It's also worth noting that cpb is still a measure of average throughput not variance or latency and while Go's buffered ChaCha8 implementation amortizes to 2 cpb, when the 32-element buffer exhausts, generating a new number is much more expensive than the previous 31 numbers. I wouldn't say the performance is good enough for every use case. |
|
Testing on Celeron J3455 @ 1.5 GHz (4 physical and logical cores) gave me PCG at 1.2 cpb and ChaCha8 at 2.6 cpb with cpu=1, but PCG stayed relatively constant across cpu=1,2,4,8 (worst was 1.8 cpb) while ChaCha8 slowed to 6.5 cpb at cpu=2 and 7.5 cpb at cpu=4 and cpu=8.
Back on my M1 Mac (8 logical and physical? cores), both ChaCha8 and PCG generally got better with more cores. ChaCha8 got down to 0.76 cpb at cpu=4 (then regressed a bit at cpu=8) while PCG got down to 0.26 cpb at cpu=8.
I don't think any of these results rule ChaCha8 out completely, though again I'm looking from the perspective of video games, which generally monopolize a machine while running.