Hacker News new | ask | show | jobs
by z4y5f3 531 days ago
NVIDIA is likely citing 1 PFlops at FP 4 sparse (they did this for GB200), so that is 128 TFlops BF16 dense, or 2/3 of what RTX 4090 is capable of. I would put the memory bandwidth at 546 GBps, using the same 512 bit LPDDR5X 8533 Mbps as Apple M4 max.
2 comments

I can't see how this will work in terms of a TDP. 2/3 of the 4090 power would be several times more power than can be effectively cooled in the physical form factor of an Apple Mini. Either there is severe downclocking happening under full throttle, or NVIDIA have come up with more low power design mojo than Apple has been able to muster for the M4 Max.
Based on your evaluation, it sounds like it will run inference at speed similar to an M4 Max and also allow "startups" to experiment with fine tuning larger models or larger context windows.

It's the best "dev board" setup I've seen so far. It might be part of their larger commercial plan but it definitely hits the sweet spot for the home enthusiast who have been pleading for more VRAM.