Y
Hacker News
new
|
ask
|
show
|
jobs
by
valine
80 days ago
Yup exactly, in principle it helps with both inference speed by reducing memory bandwidth usage and also reduces the memory footprint of your kvcache.