|
Have you considered the Framework Desktop setup they mentioned in their announcement blog post[0]? Just marketing fluff, or is there any merit to it? > The top-end Ryzen AI Max+ 395 configuration with 128GB of memory starts at just $1999 USD. This is excellent for gaming, but it is a truly wild value proposition for AI workloads. Local AI inference has been heavily restricted to date by the limited memory capacity and high prices of consumer and workstation graphics cards. With Framework Desktop, you can run giant, capable models like Llama 3.3 70B Q6 at real-time conversational speed right on your desk. With USB4 and 5Gbit Ethernet networking, you can connect multiple systems or Mainboards to run even larger models like the full DeepSeek R1 671B. I'm futsing around with setups, but adding up the specs would give 384GB of VRAM and 512GB total memory, at a cost of about $10,000-$12,000. This is all highly dubious napkin math, and I hope to see more experimentation in this space. There's of course the moving target of cloud costs and performance, so analysing break-even time is even more precarious. So if this sort of setup would work, its cost-effectiveness is a mystery to me. [0] https://frame.work/be/en/blog/introducing-the-framework-desk... |
It will run some big MoEs at a decent speed (eg, Llama 4 Scout 109B-A17B Q4 at almost 20 tok/s). The other issue is its prefill - only about 200 tok/s due to having only very under-optimized RDNA3 GEMMs. From my testing, you usually have to trade off pp for tg.
If you are willing to spend $10K for hardware, I'd say you are much better off w/ EPYC and 12-24 channels of DDR5, and a couple fast GPUS for shared experts and TFLOPS. But, unless you are doing all-night batch processing, that $10K is probably better spent on paying per token or even renting GPUs (especially when you take into account power).
Of course, there may be other reasons you'd want to inference locally (privacy, etc).