|
|
|
|
|
by sosodev
2 hours ago
|
|
I don’t have any particular model in mind, sorry. My data is just rough estimates based on my experience with a single node setup. You might need to opt for a 2 or 3 bit model to get the full context window. The KV cache memory consumption as well overall performance will be heavily dependent on the model’s architecture. The performance too will depend a lot on the inference server chosen and its configuration. I suspect a sub-agent running a much smaller model would be the ideal way to get the latest knowledge via web search and summarization. I’m not trying to say that this would be a great experience or really compete with just buying a subscription to the top models. Rather I just wanted to point out that $300k is an absurd estimate for a trillion param model meant for personal use. |
|