Hacker News new | ask | show | jobs
by alfiedotwtf 42 days ago
If 8 x RTX 6000 is getting you 20s before initial token, how are cloud vendors doing this?
1 comments

RTX6000s are great but they are several times slower than a real datacenter-grade GPU. They still use DDR memory rather than HBM, for example.