Hacker News new | ask | show | jobs
by boroboro4 545 days ago
They discuss it in the paper and recommend 32 GPUs (H800 in their case) for prefill stage and 320 GPUs for decoding.

=)