Hacker News new | ask | show | jobs
by mythz 513 days ago
Someone also got the full Q8 R1 running on a $6K PC without a GPU on 2x EPYC with 768GB DDR5 RAM running at 6-8 tok/s [1].

Will be interesting to see the value/performance compared to next gen M4 Ultra's (or Extreme?) vs NVIDIA's new DIGITS [2] when they're released.

[1] https://x.com/carrigmat/status/1884244369907278106

[2] https://www.nvidia.com/en-us/project-digits/

4 comments

Digits will be $3k and have 128GB of unified memory, so don't we already know that it wouldn't compare well this this rig? 128 won't be enough to fit the model in memory.

As for Apple, we'll see.

They can be linked, e.g. 2x DIGITS can run 405B models [1]. Won't know what value/performance we can get until they start shipping them in May.

https://nvidianews.nvidia.com/news/nvidia-puts-grace-blackwe...

What's the memory bandwidth like compared to the above EPYC setup that the tweeter claims has "24 channels of DDR5 RAM" ?
Wow!

6 to 8 tokens per second.

And less than a tenth of the cost of a GPU setup.

Nice! Xeon 6 using AMX-BF16/INT8 Instructions should be something like 5 times faster than that....