Hacker News new | ask | show | jobs
by bee_rider 169 days ago
There are also CPU extensions like AVX512-VNNI and AVX512-BF16. Maybe the idea of communicating out to a card that holds your model will eventually go away. Inference is not too memory bandwidth hungry, right?