Hacker News new | ask | show | jobs
by chihuahua 907 days ago
> the quality of the response isn't really what we're looking for here. We're looking for speed i.e. tokens per second.

But if it was generating high-quality responses, would that not make it go slower?

1 comments

That would involve using a different model. This is not about the model, it’s about the relative speed improvement from the hardware, with this model as a demo.