Hacker News new | ask | show | jobs
by ordu 95 days ago
The bitter lesson has no utility function, but it has a predicting power. Decision theory, bayesian networks and causality will see a niche applications while LLMs are getting all the money. If the former are as good as their promises, they will be developing tools and accruing a knowledge of how to use them and which problems are good for them. It will last till LLMs hit the local maxima and will not be able to move further. They will try to eat even more resources to overcome it, but they'll get just some more evidence of LLMs being trapped in a local maxima. Stocks will crush, the market will correct itself, and a lot of smart unemployed people will start to look for ways to get away from local maxima.

At that moment things will become really interesting. If decision theory, bayesianism and causality will be able to show something that can be combined with LLM to create something marketable, then they will have their big chance. Or maybe those smart people will devise some other way out of the local maxima.

Bayesian methods and causality has their applications, there are tools to use them, but you can't just feed news into them to get back a most likely structure of a secret global government run by interdimensional lizard people. Probably if you combine them with LLM, than the resulting tool will be able to perform this task?

1 comments

What irks me a bit at the way the Bitter Lesson is interpreted is that seemingly it didn't just throw out handcrafted model/feature generation, but also any attempt to interpret the learned models and features.

Like, in theory, this should be the absolute best time for people interested in analyzing unstructured data: Here there is this wealth of open-weight models, trained on half the internet that must have developed all kinds of absolutely insane feature detectors for all kinds of media: Programming languages, human-language prose, images, audio, video, whatever you want!

In practice, the models are mostly treated as black boxes and the weights as inscrutable. Which is why we now have the weird situation that our models are able to understand incredibly subtle and abstract semantic concepts in text - but the pre- and postprocessing is still on the level of regexes and string heuristics like 50 years ago. There doesn't seem to be any inbetween.