| If this is the true cost of AI then the future might be dedicated extension cards for computers that hardcode entire models + weights. Downside: you need to buy a new one for each model. Upside: insanely fast inference and zero subscription cost, only one time purchase cost. Once a certain open source model gets good enough this might become viable. Right now the landscape is still shifting too fast. State of the art models might remain on subscription, expensive and might be used by large companies only. State of the art companies might also create their own hardware with hard-baked weights on chip that they don't release to the public, as it might just make more financial sense long term once they "stabilize" on a certain model. |
Having lightning-speed, local inference of a super high-quality model would be incredible. If you haven't played with it, check out Taalas's demo [1].
Honestly, though - I have my doubts. Recurring revenue is just too nice to pass up; I'm sure AI companies wouldn't want me buying a dedicated Opus card and not giving them money for several years until there's something worth upgrading to.
[0] https://taalas.com/
[1] https://chatjimmy.ai/