Hacker News new | ask | show | jobs
by ls612 23 days ago
Yes it is more efficient in $/tok to run at scale than to run just for yourself. Everyone selling Deepseek V4 inference is selling an undifferentiated good. They have run the numbers on how much it costs and are competing against a dozen other outfits also selling undifferentiated open weights tokens. Whatever the dollar cost they face to rent those GPUs will be what they are able to charge in the competitive market. That is great for you and me because we can buy tokens at pretty much exactly what it costs to produce them.
2 comments

They are selling it below costs and training on your tool calling, and potentially all your data. They're selling it for cheap to get your data dumbass.
Whoever purchased their RAM last month vs this month has the advantage, I suspect.