Hacker News new | ask | show | jobs
by NotGMan 13 days ago
If this is the true cost of AI then the future might be dedicated extension cards for computers that hardcode entire models + weights.

Downside: you need to buy a new one for each model.

Upside: insanely fast inference and zero subscription cost, only one time purchase cost.

Once a certain open source model gets good enough this might become viable.

Right now the landscape is still shifting too fast.

State of the art models might remain on subscription, expensive and might be used by large companies only.

State of the art companies might also create their own hardware with hard-baked weights on chip that they don't release to the public, as it might just make more financial sense long term once they "stabilize" on a certain model.

6 comments

I would shell out cash right now for something like Opus on silicon, like what Taalas [0] has built for Llama 3.1.

Having lightning-speed, local inference of a super high-quality model would be incredible. If you haven't played with it, check out Taalas's demo [1].

Honestly, though - I have my doubts. Recurring revenue is just too nice to pass up; I'm sure AI companies wouldn't want me buying a dedicated Opus card and not giving them money for several years until there's something worth upgrading to.

[0] https://taalas.com/

[1] https://chatjimmy.ai/

Recurring revenue does you no good if it is in fact a recurring loss because your subscription customers use up more than you're charging them.

Of course expecting the metaphorical Harvard Business School analysts to realize that is asking a lot. Subscriptions are Good and Goodness is Subscriptions, and like any other mass of people following trends the preconditions on when subscriptions are good for a business tend to get lost in the frenzy.

If local AI can make financial sense, then cloud AI will make even more financial sense - an AI card can serve a number of users simultaneously, can be utilized 24x7 instead of however many hours per week you use it, and has other improved efficiencies at scale. Cloud offerings with privacy and security controls will be available for those who want it, just like non-AI cloud offerings with security/privacy options are available today.
This seems rather optimistic. Tons of subscriptions people pay for would be cheaper/better locally, but they want turnkey and to set it then forget it.

The name of the game with businesses right now is also subscriptions. Hell look at Microsoft. The incentives are not there to let you pay a one time cost for some hardware.

Or someone works out a hardware architecture that's optimized for AI inference in the way you describe, but also good for 3D graphics.

That would fuse 3D graphics and AI accelerators into 1 and the same unit, as far as consumer hardware is concerned.

Yes this is what I"m waiting for. I do hope these cards will not come with any kind of vendor locking though.
What kind of vendor locking would be possible?
Any kind really. If Apple can lock down the CPU of an iPhone, then I'm sure it is possible to lock down an LLM chip. The business model may then be that you can buy certain "agent apps" and run them on your LLM card. But I have to stop here because I don't want to give anyone any ideas though I'm sure they are "creative" enough.
IP-protected models manifested directly in silicon.

Everything we’re using now is the equivalent of building a GPU on an FPGA: the hardware is general purpose at one abstraction level, and that comes with inefficiency at the next layer up. Collapse the levels, gain efficiency at the cost of generality.

The whole premise of what's being described here is to bake the weights into the silicon. That isn't what I'd describe as vendor lock-in, any more than I'd describe a CPU that can only execute ARM instructions as vendor locked.

To answer my own question, I bet they could figure out a way to still bill you per-token, if they wanted to.

Portability between x86 and ARM is not a form of vendor lock?

And of course they could bill per-token, same way cable PPV worked (the bits were already in your house). But the cost structure of weights in silicon means that competitors would be encouraged to compete on this per-token cost, as their marginal cost would be zero.

I don’t see that being a durable business model, but I guess the counter argument is it’s also similar to game consoles, where initial hardware is subsidized and the business model assumes ongoing payment for bits.

>Upside: zero subscription cost, only one time purchase cost.

the world has long since moved on from that business model, unfortunately.