Hacker News new | ask | show | jobs
by cheptsov 556 days ago
In this one we were only using 3.1 405B FP8. We took one model to simplify the setup and were mostly looking at the memory saturation effect. So basically we compared inference metrics of the same model. I suppose comparing 3.1 and 3.2 will be difficult as they are different models entirely. But open to ideas