Hacker News new | ask | show | jobs
by veselin 1125 days ago
What speed should we expect from the model on consumer hardware? I tried a 8 bit quantized version on 4090 and got it to generate 100 tokens for 13 second, which seems a bit slow to me.