Y
Hacker News
new
|
ask
|
show
|
jobs
by
Palmik
522 days ago
Inference for single example is memory bound. By doing batch inference, you can interleave computation with memory loads, without losing much speed (up until you cross the compute bound threshold).