Hacker News new | ask | show | jobs
by root_axis 46 days ago
I have two A100s and have been playing with local models for years. There's definitely moments where they are quite impressive, but small context sizes and unreliability become immediately obvious.

> For those of us a bit crazy, we are running KimiK2.6, GLM5.1

Yes, those can compare to Opus, but you can't run those unquantized for less than $400k in hardware.

1 comments

Two Mac Studio M3 Ultra 512GB and 1 USB cable can run all those models - maybe about $30,000 in hardware - and based on my benchmarks, those Mac Studios were twice as fast as the A100s on Deepseek v4 Flash, which has a quantization but not really a lossy one.
That cannot run KimiK2.6 or GLM5.1 i.e models within the ballpark of anything offered by frontier companies.
I run kimik2.6 and GLM5.1 on less than $10,000 system. Granted I started putting my system together 2 years ago when things were much cheaper. I run DeepseekV4Flash with 1 million context locally.
Yes it can, but the experience is not great.

A single M3 maxed can run a Q2 Kimi 2.6, though thats with a hardly degraded perplexity.

2x M3s with RDMA can run a lossless Kimi2.6 at Q4, but with CPU only you would get okayish decode but horrible (+1m) TTFT, that wouldnt be a great _interactive_ experience.