Hacker News new | ask | show | jobs
by Palmik 522 days ago
Inference for single example is memory bound. By doing batch inference, you can interleave computation with memory loads, without losing much speed (up until you cross the compute bound threshold).