Hacker News new | ask | show | jobs
by YetAnotherNick 580 days ago
This two sources[1][2] shows 1500-2500 token/per second on 8*H100.

[1]: https://lmsys.org/blog/2024-07-25-sglang-llama3/?ref=blog.ru...

[2]: https://www.snowflake.com/engineering-blog/optimize-llms-wit...