Hacker News new | ask | show | jobs
by Jensson 717 days ago
> Now the LLM can choose to switch, at its own discretion, back and forth between a talking and listening mode

How would it intelligently do this? What data would you train on? You don't have trillions words of text where humans wrote what they thought silently interwoven with what they wrote publicly.

History has shown over and over that hard coded ad hoc solutions to these "simple problems" never work to create intelligent agents, you need to train the model to do that from the start you can't patch in intelligence after the fact. Those additions can be useful, but they have never been intelligent.

Anyway, such a model I'd call "stream of mind model" rather than a language model, it would fundamentally solve many of the problems with current LLM where their thinking is reliant on the shape of the answer, while a stream of mind model would shape its thinking to fit the problem and then shape the formatting to fit the communication needs.

Such a model as this guy describes would be a massive step forward, so I agree with this, but it is way too expensive to train, not due to lack of compute but due to lack of data. And I don't see that data being done within the next decade if ever, humans don't really like writing down their hidden thoughts, and you'd need to pay them to generate data amounts equivalent to the internet...

2 comments

Replying to: How would a model intelligently switch between listening or speaking modes? What data would you train on? (I'm the author of the parent article.)

It's a fair question, and I don't have all the answers. But for this question, there might be training data available from everyday human conversations. For example, we could use a speech-to-text model that's able to distinguish speakers, and look for points where one person decided to start speaking (that would be training data for when to switch modes). Ideally, the speech-to-text model would be able to include text even when both people spoke at once (this would provide more realistic and complete training data).

I've noticed that the audio mode in ChatGPT's app is good at noticing when I'm done speaking to it, and it reacts accurately enough that I suspect it's more sophisticated than "wait for silence." If there is a "notice the end of speaking" model - which is not a crazy assumption - then I can imagine a slightly more complicated model that notices a combination of "now is a good time to talk + I have something to say."

It's surprising people still consider large scale language models as a key solution to the problem of AGI, when it has become quite clear they will hit all practical scaling limits without surpassing the "well informed imbecile" intelligence threshold.

All evidence points towards human reason as a fundamentally different approach, orders of magnitude more efficient at integrating and making sense of ridiculously smaller amounts of training data.

I'm pretty sure that the argument would be that extensions of current LLM and ML techniques could be the solution to the problem of AGI.

And all evidence actually points toward human reason as an incredibly inefficient and horrifyingly error-prone approach, that only got as far as it did because we're running 8.1 billion human minds in parallel.

While evidence suggests that human reasoning uses a fundamentally different approach, it remains to be seen whether human reasoning uses a fundamentally superior approach.

> human reason as an incredibly inefficient

I have yet to see a single AI system that can learn to produce the word "mama" after doing fully self supervised training, and being fed only the cosine transform of the audio it produces and a few hundred hours of video/audio feed showing a mom saying the word and becoming very happy when the word is finally uttered. Did I mention the output must be produced using an array of mechanical oscillators, resonance chambers and bellows with unknown and highly variable acoustic parameters, that need to be discovered and tuned at runtime?

I have seen this "human intelligence training is wasteful" line and I think it is complete nonsense. The efficiency with which humans can acquire any language with barely any training data is unfathomably better than large scale statistical models.

> It's surprising people still consider large scale language models as a key solution to the problem of AGI

Marketing, and a bit of collective delusion by a lot of people having the "can't understand what they are paid not to" thing going on.