|
|
|
|
|
by aurareturn
475 days ago
|
|
It's good enough to run whatever local model you want. 2x 80core GPU is no joke. Linking them together gives it effectively 1.6 TB/s of bandwidth. 1TB of total memory. You can run the full Deepseek 671b q8 model at 40 tokens/s. Q4 model at 80 tokens/s. 37B active params at a time because R1 is MoE. Linking 2 of these together let's you run a model more capable (R1) than GPT4o at a comfortable speed at home. That was simply fantasy a year ago. |
|