|
|
|
|
|
by stuntprogrammer
3486 days ago
|
|
Current publicly announced AVX512 does not support fp16. Skylake Server (SKX) and Knights Landing (KNL) are at a disadvantage here. They've not publicly said anything about extensions in Knights Hill (the long announced successor to KNL). That said, Intel have announced the emergency "Knights Mill" processor jammed into the roundmap between KNL and Knights Hill. It's specifically targeted at deep learning workloads and one might expect FP16 support. They had a bullet point suggesting 'variable' precision too. I would guess that means Williamson style variable fixed point. (I also guess that the Nervena "flexpoint" is a trademarked variant of it). I assume the FPGA inference card supports fp16. And Lake Crest (the first Nervena chip sampling next year) will support flex point of course. I would expect subsequent Xeon / Lake Crest successor integrations to do the same. Fun times.. Aside on the compiler work -- I think it's not that hard to emit this instruction at least for GEMM style kernels where it's relatively obvious. |
|
Net-net, data and code need to be structured for AVX to achieve the potential performance gains, and that's 80% of the work.
Once you structure the data and code for AVX, yes you can use regular C statements, then experiment with optimization flags until the compiler generates the intended instructions (and hasn't introduced excessive register spills). But its hard to see how that's any easier than using the intrinsics.