|
|
|
|
|
by Smerity
2003 days ago
|
|
I work in the space and was impressed with NN-512 as there's a painful gap in inference cost between CPU and GPU that doesn't have to exist. Intel and AMD are really missing a boat here, most other companies have enough cash they just go to GPUs, academics rarely sling low level code even in CUDA let alone AVX-512, and other than Fabrice Bellard's work few I've seen few go that low level. My suggestion would be to focus on an initial use case where a very limited low cost / high efficiency CPU model can provide massive advantage. NN-512 should be the framework that expands from that Redis like core. The limited use case tactic is what I'm focusing on[1], mainly as I have a particular application and have less technical brilliance than yourself so need to focus ;) An aged but still relevant example is the early word2vec work which was (and still is) frequently better to throw onto CPUs than GPUs. A well tuned implementation is not only advantageous on CPU but can win out in many scenarios where cost / latency / ... are important. Congrats on the project though! I'd be curious for your thoughts for the future if you ever want to chat =] [1]: Initial experiments written up as a tutorial with Rust and ISPC for a specific CPU based NN task - https://state.smerity.com/smerity/state/01E8RNH7HRRJT2A63NSX... |
|