Hacker News new | ask | show | jobs
by chihuahua 906 days ago
But if the quality of the response is poor, it's irrelevant that it was generated quickly. If it was using different data to generate higher quality responses, would that not slow it down?
1 comments

nomel gave a good answer in a different thread

> This is not about the model, it’s about the relative speed improvement from the hardware, with this model as a demo.

To compare apples to apples look at the tokens per second of other systems running Llama 2 70B 4096. We're by far the fastest!

https://news.ycombinator.com/item?id=38742466