|
|
|
|
|
by discobot
919 days ago
|
|
regarding the prompt processing and token generation you are correct it makes sense to benchmark them infependently since prompt processing is done in parralel for each token and is compute bound and token generation is sequential and bound by memory banwidth |
|