Disclaimer: I'm one of the authors of sneller core. We have been working on this project for more than a year. It's has got neat AVX512-centered architecture and many neat tricks inside.
Sneller founder here: we do not have any non-AVX code so we cannot compare directly against that. But generally speaking our code always works on 16 lanes in parallel per core, so that gives a huge speed-up.