Hacker News new | ask | show | jobs
by hurrdurr57 714 days ago
Given how quickly AI is progressing from the software side, and how poorly AI scales from just throwing raw compute time at a model, I don't see a company holding onto the lead for very long with that strategy.

If I can come out with a model a year later, and it can provide 95% of the performance while costing 10% as much to run, I think I would end up stealing a lot of customers before they had a chance to break even.

Take Llama3-8B for example, this is an 8 billion parameter model from 2024 that performs about as well the the original ChatGPT, a 175 billion parameter model from 2022. It only took 2 years before a model that can run on a desktop could compete with a model that required a data center.

2 comments

LLMs actually scale extremely well just by throwing compute at them. That's the whole reason they took off. Training a bigger model or training it longer or increasing the dataset all work more or less equally well. Now that we've saturated the dataset component (at least for human written text) pretty much, everyone throws their compute at bigger models or more epochs.
It's totally reasonable to take both bets. It's unclear that the company betting 100B wouldn't also be the company making the 1 MM bet.

If you're MSFT - you don't care who wins as long as you have cost competitive rights to embed the AI in all of your products - earlier than others.