|
|
|
|
|
by thoughtpolice
3952 days ago
|
|
I think the horsepower on those machines shouldn't be underestimated, because they are not entirely as equivalent as you think... I was thoroughly surprised when an unoptimized (but correct!) ChaCha20/8 implementation I wrote on a 3.0GHz POWER8 little-endian machine was about as fast as the latest 3.5gHz Xeons @ AES-256 with AESNI (about 1.3cpb vs 1.0cpb IIRC, but the latter has a dedicated hardware unit for it!) On that same Xeon, the ChaCha20 code only hit somewhere around 5cpb - that's software vs silicon! It also has 170 cores and was actually a QEMU instance (w/ hardware virtualization extensions) vs raw dedicated metal. If you're doing any kind of numerical or analytic workloads (even databases), I wouldn't throw them aside so quickly. You can even get CUDA for them these days, and certain physical addons like CAPI allow you to map and coherently share physical CPU address space with FPGAs or GPUs. If I could get those things in a reasonable workstation configuration, I'd probably go for it tbh. (I'd be more than willing to repeat this and post some more accurate numbers if anyone cares. I also need to get around to benchmarking AESNI vs that POWER8 machines _actual_ dedicated AES unit. The benchmark above was only flexing its vector/integer unit capabilities. ;) |
|