|
|
|
|
|
by 112233
358 days ago
|
|
I can run full deepseek r1 on m1 max with 64GB of ram. Around 0.5 t/s with small quant. Q4 quant of Maverick (253 GB) runs at 2.3 t/s on it (no GPU offload). Practically, last gen or even ES/QS EPYC or Xeon (with AMX), enough RAM to fill all 8 or 12 channels plus fast storage (4 Gen5 NVMEs are almost 60 GB/s) on paper at least look like cheapest way to run these huge MoE models at hobbyist speeds. |
|