Y
Hacker News
new
|
ask
|
show
|
jobs
by
jensb1
1015 days ago
Could someone explain how token generation speed relates to latency for the first token to be outputted?
And if anyone have any metrics on latency on a 4090 for the 70B model, that would be very helpful.
2 comments
MezzoDelCammin
1015 days ago
4090Ti? AFAIK that has never been released
link
jensb1
1015 days ago
typo
link
redox99
1015 days ago
Unrelated. What matters for that is prompt processing time (which is in the high hundreds of tokens per second).
link