Hacker News new | ask | show | jobs
by stephc_int13 309 days ago
My current intuition on this topic is that they are right about scaling but they are training on the wrong data.

LLMs were not intended to be the core foundation of artificial intelligence but an experiment around deep learning and language. Its success was an almost accidental byproduct of the availability of large amount of structured data to train from and the natural human bias to be tricked by language (Eliza effect).

But human language itself is quite weak from a cognitive perspective and we end up with an extremely broad but shallow and brittle model. The recent and extremely costly attempts to build reasoning around don't seem much more promising than using a lot of hardcoded heuristics, basically ignoring the bitter lesson.

I've seen many argue that a real human level AI should be trained from real-world experience, I am not sure this is true, but training should likely start from lower-level data than language, still using tokens and huge scale, and probably deeper networks.

2 comments

Not all AI is LLMs. That's just what's most prevalent right now. There's still great work being done by models that don't "speak" but "perform". The issue is they need to be trained to perform like you said. The more tools like Claude Code are used, the more training they receive as well. I do think we'll see a plateau (if we haven't reached it already) of diminishing returns and we'll seek out new algorithms to improve it.

Never underestimate the will of someone determined to gain an extra 10% performance or accuracy. It's the last 1% I worry about. 99.99% uptime is great until it isn't. 99% accuracy is great until it isn't. These things could be mitigated by running inference on different quantinizations of a model tree but ultimately we're going to have to triple check the work somehow.

> The more tools like Claude Code are used, the more training they receive as well.

What do you mean? A model doesn't improve because it's being used more. Are you saying Anthropic invests more into Claude Code the more people use it? Or are you saying they collect its output and train it on it?

They probably mean https://en.wikipedia.org/wiki/Reinforcement_learning_from_hu..., but I don't think that is a huge factor for Claude Code.
I assume they mean that they can gather users inputs (e.g. the user correcting the model, suggesting improvements, etc.).
Definitely smarter people than me have thought about this already, but I’ve been trying to think about human language and how thoughts form in my head lately. How does thinking feel to you?

I feel like thoughts appear in my head conceptually mostly formed, but then I start sequentially coming up with sentences to express them, almost as if I’m writing them down for somebody else. In that process, I edit a bunch, so the final thought is influenced quite a bit by how English ends to be written. Maybe even constrained by expressability in English. But English has the ability to express fuzzy concepts. And the kernel started as a more intuitive thing.

It is a weird interplay.

LLMs aren't monolingual so you might want to expand that beyond English. Consider how multilingual people think.

Also, apparently it's pretty common for people to think in words and have an internal monologue. I hadn't realized this was a thing until recently but it seems many people don't think abstractly as you've described.

There have been psychological experiments that have shown what OP experiences.

In this particular experiment subjects were asked to make a series of arbitrary choices between 2 items, while being hooked up to some brain scanning equipment. Subjects were asked to think about it for 5 seconds before making the choice.

The experiment showed that scientists were able to predict the choice before the subjects consciously reasoned about it. So the experiment indicates that the choice is made subconsciously, and the reasoning ends up on whatever choice was made.

This aligns pretty well with some other psychological theories, and it points at that a lot/most of our brain processes happen subconsciously, and our conscious experience mainly serves as a way of providing a coherent story about why we experience what we experience.

So it seems that no matter if you 'think' abstractly or visually or with a monologue, this is merely a small step in our overall cognition, and doesn't really change the fact that thing bubble up from our subconscious before we become conscious of it.

5 seconds is an interesting amount of time. I think we all can simulate an argument in our heads, and it could even convince us. But, that takes time. 5 seconds seems like just enough time to result a thought into words.

I wonder if they could plot the mind-changing process resulting from that sort of argument. If that is what even happens.

Since you understand predictive coding, I believe you will be interested in this:

https://github.com/dmf-archive/IPWT