|
|
|
|
|
by jauntywundrkind
918 days ago
|
|
They seemed exceedingly hard to use well but interestingly capable & full of promise. And they were made in a much more primitive software age. I'd love to hear about what didn't work. OpenMP support seemed ok maybe but OpenMP is just a platform, figuring out software architectures that's mechanistically sympathetic to the system is hard. It would be so interesting to see what Xeon Phi might have been if we had Calcite or Velox or OpenXLA or other execution engine/optimizers that can orchestrate usage. The possibility of something like Phi seems so much higher now. There's such a consensus around Phi tanking, and yes, some people came and tried and failed. But most of those lessons, of why it wasn't working (or was!) never survived the era, never were turned into stories & research that illuminates what Phi really was. My feeling is that most people were staying the course on GPU stuff, and that there weren't that many people trying Phi. I'd like more than the heresay heaped at Phi's feed to judge by. |
|
Math guys came up with a list of algorithms to try for a search engine backend.
What we needed was matrix multiplication and maybe some decision tree walking (that was some time ago, trees were still big back then, NNs were seen as too compute-intensive for no clear benefits). So we thought that it might be cool to have a tool that would support both. Phi sounded just right for both.
And things written to AVX-512 did work. Software surpisingly easy to port.
But then comes the usual SIMD/CPU trouble: every SIMD generation wants a little software rewrite. So for both Phi generations we had to update our code. For things not compatible with the SIMD approach (think tree-walking) it is just a weak x86.
In theory Phi's were universal, in practice what we got was: okay number crunching, bad generic compute.
GPU was somewhat similar: the software stack was unstable, CUDA just did not materialize as a standard yet. But every generation introduced a massive increase in compute available. And boy did NVIDIA move fast...
So GPU situation was: amazing number crunching, no generic compute.
And then there were a few ML breakthroughs results which rendered everything that did not look like a matrix multiplication obsolete.
PS I wouldn't take this story too seriously, details may vary.