Hacker News new | ask | show | jobs
by mgreg 916 days ago
Some details that might interest you from SemiAnalysis [1] just published yesterday. There's quite a bit that goes into optimizing inference with lots of dials to turn. One thing that does seem to have a large impact is batch size which is a benefit of scale.

1. https://www.semianalysis.com/p/inference-race-to-the-bottom-...