Hacker News new | ask | show | jobs
by Aurornis 127 days ago
> With Apple devices you get very fast predictions once it gets going but it is inferior to nvidia precisely during prefetch (processing prompt/context) before it really gets going

I have a Mac and an nVidia build and I’m not disagreeing

But nobody is building a useful nVidia LLM box for the price of a $500 Mac Mini

You’re also not getting as much RAM as a Mac Studio unless you’re stacking multiple $8,000 nVidia RTX 6000s.

There is always something faster in LLM hardware. Apple is popular for the price points of average consumers.

1 comments

Not many are getting useful inference out of a $500 mac mini, due to only having 16GB of RAM.
It depends. This particular model has larger experts with more active parameters so 16GB is likely not enough (at least not without further tricks) but there are much sparser models where an active expert can be in RAM while the weights for all other experts stay on disk. This becomes more and more of a necessity as models get sparser and RAM itself gets tighter. It lowers performance but the end result can still be "useful".