|
|
|
|
|
by telotortium
1169 days ago
|
|
Probably a combination of (a) ML framework people not paying much attention to CPU inference due to already having GPUs/TPUs already lying around for training - CPU inference is just for very quick experiments (b) research code has never been the best optimized for performance (c) ML people are not generally systems programmers, and a lot of systems programmers are afraid to mess with the ML code outside of low-level computation kernels (doesn't help that ML code is notoriously unreproducible). |
|