|
|
|
|
|
by zargon
373 days ago
|
|
> It makes me wonder, how much better is 671B vs 32B? 32B has improved leaps and bounds in the past year. But Deepseek 671B is still a night and day comparison. 671B just knows so much more stuff. The main issue with RAM-only builds is that prompt ingestion is incredibly slow. If you're going to be feeding in any context at all, it's horrendous. Most people quote their tokens/s with basically non-existent context (a few hundred tokens). Figure out if you're going to be using context, and how much patience you have. Research the speed you'll be getting for prompt processing / token generation at your desired context length in each instance, and make your decision based on that. |
|