Y
Hacker News
new
|
ask
|
show
|
jobs
by
bee_rider
169 days ago
There are also CPU extensions like AVX512-VNNI and AVX512-BF16. Maybe the idea of communicating out to a card that holds your model will eventually go away. Inference is not
too
memory bandwidth hungry, right?