|
|
|
|
|
by pama
291 days ago
|
|
Agree that the writeup is very wrong, especially for the output tokens. Here is how anyone with enough money to allocate a small cluster of powerful GPUs can decode huge models at scale, since nearly 4 months ago, with costs of 0.2 USD/million output tokens. https://lmsys.org/blog/2025-05-05-large-scale-ep/ This has gotten significantly cheaper yet with additional code hacks since then, and with using the B200s. |
|