Y
Hacker News
new
|
ask
|
show
|
jobs
by
lloyd-christmas
10 days ago
I thought the same thing when I started using locals, but the reality is that - for a given context depth - the token generation speed doesn't change whether it's 128 or 8000, it just lengthens the benchmark run time.