Y
Hacker News
new
|
ask
|
show
|
jobs
by
veselin
1125 days ago
What speed should we expect from the model on consumer hardware? I tried a 8 bit quantized version on 4090 and got it to generate 100 tokens for 13 second, which seems a bit slow to me.