Hacker News new | ask | show | jobs
by discobot 919 days ago
regarding the prompt processing and token generation you are correct

it makes sense to benchmark them infependently since prompt processing is done in parralel for each token and is compute bound and token generation is sequential and bound by memory banwidth