Hacker News new | ask | show | jobs
by bilbo0s 39 days ago
This.

I’ve begun to suspect that most people are probably running different hardware. Sure, you run the latest deep flash on your brand new M5 128G maybe you get acceptable performance?

But honestly, how many people have an extra $9000 laying around these days?

Right now, running with acceptable performance is kind of a luxury. I wish the people who always say - “This is great!” - would realize that not everyone has their hardware.

1 comments

Actually even with a 9k hardware you won't get good enough performance. There is an interesting video from antirez on trying to run deepseek v4 flash 2bits on a m3 max 128GB ... and the result is kind delusional: as soon as the context start growing you are around 20token/s.
Prefill performance used to be the real bottleneck on antirez's DS4 and that's been greatly improved by now, it doesn't perceivably slow down with growing context.