Hacker News new | ask | show | jobs
by stochastic_monk 2955 days ago
I was looking at [0] for that number. You're right, it's closer to 3 than 4; I must have rounded ~12k down to 10k and 37k up to 40k. I could imagine some other factors speeding it up further, as well. There were a number of missing instructions in AVX2 that they've filled in for AVX512 which could play a role.

Thanks for the heads-up RE: BLIS, I'd forgotten about them; it's probably the best option, especially considering its open source status.

[0] https://github.com/xianyi/OpenBLAS/issues/991#issuecomment-3...