Hacker News new | ask | show | jobs
by agnokapathetic 921 days ago
batch size 1 -- this is a terrible benchmark that really only shows memory bandwidth only -- LLM inference on Llama2-70b is memory bound up to a batch size of a half dozen or so.