Hacker News new | ask | show | jobs
by tedsanders 705 days ago
Yeah, to put these prices in perspective: when tokens get this cheap, $1M buys you more than a trillion output tokens.

To earn appreciable revenue at this price, an LLM company needs to be regularly generating multiple internets worth of text.

On the one hand, generating multiple internets of text seems outlandish.

But on the other hand, we're now approaching the point where you can start building LLMs into software without fretting about cost. Now that you can buy ~30 pages for a penny (instead of a dollar) you can really start to throw it into websites, games, search bars, natural language interfaces etc. without every user costing you much.

But small models are not the endgame for these AI companies, as truly general intelligence is a market worth trillions.

What this ~98% cost drop over 2 years hints at is that when AGI does arrive, it might not be horribly expensive.

1 comments

I don't expect organizations to need to generate 1T output tokens, but 1T input tokens is common. Consider developers at a large company running queries with their entire codebase as context. Or lawyers plugging in the entire tax code to ask questions about. Each of them running dozens of queries per day on multi-millions of context input, it's going to add up quick.
Wouldn't a lawyer wanting to run queries against the entire tax code have a model that was fine-tuned on all of that data though? I mean, vs. doing RAG by sending the entire tax code on each request.
Unclear, but fine-tuning has many problems not faced by RAG:

- More prone to hallucinations

- Worse at citing sources for people to double check outputs

- Can't be updated without retraining

- Can't impose knowledge access controls for different users