Hacker News new | ask | show | jobs
by zozbot234 36 days ago
> The local rig is not free and requires very large capital expenditures while producing very low token throughput for large models.

Sometimes it really is free though, because the hardware was bought to serve some other existing needs and that capital expense was fully depreciated quite some time ago. Underutilised hardware is essentially ubiquitous.

> Within any time budget, you can get many orders of magnitude more large-model tokens off an 8xB200 than off a local rig.

But using that 8xB200 setup to run inference on cheap, non-frontier models is a plain waste. Its highest and best use is in an AI datacenter serving exceptionally smart models like Gemini DeepThink, GPT Pro or Claude Mythos. (If this isn't true, it means that the current level of large-scale investment in frontier, super intelligent AI is misplaced, and you should worry about that; not whether some models are best ran on lower-end hardware!)

1 comments

> Sometimes it really is free though, because the hardware was bought to serve some other existing needs and that capital expense was fully depreciated quite some time ago.

No one has 8xRTX Pro 6000s that have depreciated to zero "quite some time ago."

> But using that 8xB200 setup to run inference in cheap, non-frontier models is plain waste

From whose perspective? If someone wants to run an open-source model — and plenty do — someone buying or renting an 8xB200 to serve it cheaply at scale is much better than everyone buying huge amounts of pointless, wasted hardware such as 8xRTX Pro 6000s for $80,000 per person.