Hacker News new | ask | show | jobs
by nicman23 470 days ago
there is a plateau where you simply need more compute and the m4 cores are not enough, so even if they have enough ram for the model the token/s is not useful
1 comments

For all models fitting 2x 5090 (2x 32GB) that's not a problem, so you can say if you have this problem then RTX is also not an option.

On apple silicons you can always use MoE models, which work beautifully. On RTX it's kind of waste to be honest to run MoE, you'd be better off running single, whole active model that fills available memory (with enough space for the context).