| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by zargon 373 days ago

> It makes me wonder, how much better is 671B vs 32B?

32B has improved leaps and bounds in the past year. But Deepseek 671B is still a night and day comparison. 671B just knows so much more stuff.

The main issue with RAM-only builds is that prompt ingestion is incredibly slow. If you're going to be feeding in any context at all, it's horrendous. Most people quote their tokens/s with basically non-existent context (a few hundred tokens). Figure out if you're going to be using context, and how much patience you have. Research the speed you'll be getting for prompt processing / token generation at your desired context length in each instance, and make your decision based on that.