Hacker News new | ask | show | jobs
by batperson 15 days ago
If you check openrouter there are a tons of providers selling API access to open source LLMs at a fraction of the cost compared to SOTA models (codex/claude). What model you're serving and what kind of platform you serve is a big factor.

I'm no expert but I think eventually we'll have even more specialized ASIC like machines with models burned into them and a that will absorb a chunk of the market, similar to what happened to crypto mining but to a lesser degree since the work isn't as static.

2 comments

NN-specific ASICs won't buy you much more FLOPs per watt than GPUs/TPUs will. These chips are already extremely good at NN computation. Sure, you could remove GP shader support and free up 5% of your die for a few more cores (which btw is what TPUs pretty much are), but that's about it.

Either way, you'll still be starving for data.

The best work in this area is memory-integrated Big-Ass-Die or Big-Ass-Chiplet solutions like Cerebras which park SRAM right next to your cores, not ASICs.

>but I think eventually we'll have even more specialized ASIC like machines with models burned into them

This has already happened and is very interesting.

https://www.anuragk.com/blog/posts/Taalas.html

If that were the case, it would be reasonable to expect that companies like OpenAI or Anthropic, which are heavily indebted, would lose part of their business model, not because their models are bad, but because others will be cheaper and not as bad.
I think he means they will be commercially relevant and most AI compute won't be on GPUs.