Hacker News new | ask | show | jobs
by wincy 40 days ago
Cool, thanks for the information. I guess they drive prices down by massively parallelizing requests on say an H100 X8 array? So this is spread across. So if I say, wanted to use it for 8 hours a day in my theoretical world it’d be too expensive. My work definitely wouldn’t pay $100,000 for a server farm even if it’d give an AI to all our employees, you’d have to have engineers, a colocation space, basically all the problems that companies didn’t like and went to AWS for.
1 comments

Well $100k was a generous guesstimate for some time in the future where something like an Opus 4.7 is old news.

If we think about the near future, something like Kimi2.6 is within the realm of Opus 4.6 today, but requires closer to $700k in hardware to run.

Kimi 2.6 is very close to the Opus family from my experience. Also it does absolutely not require $700k to be able to run locally in an interactive fashion. We are talking more in the range of $10k for a slow Q2 with degraded perplexity, to ~$35k for an acceptably fast 200k context Q4 (quasi lossless perplexity).