| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bkettle 251 days ago
	This free tradition in software is I think one of the things that I love so much, but I don't see how it can continue with LLMs due to the extremely high training costs and the powerful hardware required for inference. It just seems like writing software will necessarily require paying rent to the LLM hosts to keep up. I guess it's possible that we'll figure out a way to do local inference in a way that is accessible to everyone in the way that most other modern software tools are, but the high training costs make that seem unlikely to me. I also worry that as we rely on LLMs more and more, we will stop producing the kind of tutorials and other content aimed at beginners that makes it so easy to pick up programming the manual way.

3 comments

levocardia 251 days ago

There's a Stephen Boyd quote that's something like "if your optimization problem is too computationally expensive, just go on vacation to Greece for a few weeks and by the time you get back, computers might be fast enough to solve it." With LLMs there's sort of an equivalent situation with cost: how mindblowing would it be able to train this kind of LLM at all even just 4 years ago? And today you can get a kindergartener level chat model for about $100. Not hard to imagine the same model costing $10 of compute in a few years.

There's also a reasonable way to "leapfrog" the training cost with a pre-trained model. So if you were doing nanochat as a learning exercise and had no money, the idea would be to code it up, run one or two very slow gradient descent iterations on your slow machine to make sure it is working, then download a pre-trained version from someone who could spare the compute.

link

piokoch 250 days ago

But in this case the reason is simple: the core algorithm is O(n^2), this not going to be improved over a few weeks.

link

dingnuts 251 days ago

> today you can get a kindergartener level chat model for about $100. Not hard to imagine the same model costing $10 of compute in a few years.

No, it's extremely hard to imagine since I used one of Karpathy's own models to have a basic chat bot like six years ago. Yes, it spoke nonsense; so did my GPT-2 fine tune four years ago and so does this.

And so does ChatGPT

Improvement is linear at best. I still think it's actually a log curve and GPT3 was the peak of the "fun" part of the curve. The only evidence I've seen otherwise is bullshit benchmarks, "agents" that increase performance 2x by increasing token usage 100x, and excited salesmen proclaiming the imminence of AGI

link

simonw 251 days ago

Apparently 800 million weekly users are finding ChatGPT useful in its present state.

link

infinitezest 251 days ago

1. According to who? Open AI? 2. Its current state is "basically free and containing no ads". I don't think this will remain true given that, as far as I know, the product is very much not making money.

link

simonw 251 days ago

Yes, that number is according to OpenAI. They released that 800m number at DevDay last week.

The most recent leaked annualized revenue rate was $12bn/year. They're spending a lot more than that but convincing customers to hand over $12bn is still a very strong indicator of demand. https://www.theinformation.com/articles/openai-hits-12-billi...

link

bgwalter 251 days ago

Part of that comes from Microsoft API deals. Part of that will most certainly come because the vast network of companies buy subscriptions to help "Open" "AI" [1].

Given the rest of circular deals, I'd also scrutinize if it applies to the revenue. The entanglement with the Microsoft investments and the fact that "Open" "AI" is a private company makes that difficult to research.

[1] In a U.S. startup, I went through three CEOs and three HR apps, which mysteriously had to change for no reason but to accommodate the new CEO's friends and their startups.

link

wordpad 250 days ago

Even with linear progression of model capability, the curve for model usefulness could be exponential, especially if we consider model cost which will come down.

For every little bit a model a smarter and more accurate there are exponentially more real world tasks it could be used for.

link

hodgesrm 251 days ago

This. It looks like one of the keys to maintaining open source is to ensure OSS developers have access to capable models. In the best of worlds, LLM vendors would recognize that open source software is the commons that feeds their models and ensure it flourishes.

In the real world...

link

DennisP 251 days ago

Maybe this isn't possible for LLMs yet, but open source versions of AlphaZero have been trained on peer-to-peer networks.

https://zero.sjeng.org/

https://katagotraining.org/

link