Hacker News new | ask | show | jobs
by throwawayffffas 10 days ago
He meant prompt eval time, but have a look at these guys: https://www.youtube.com/watch?v=ndSA9T5yvmM

Over 2500 tokens per second on a single request. With 8 MI300X.