Y
Hacker News
new
|
ask
|
show
|
jobs
by
agnokapathetic
921 days ago
batch size 1 -- this is a terrible benchmark that really only shows memory bandwidth only -- LLM inference on Llama2-70b is memory bound up to a batch size of a half dozen or so.