Hacker News new | ask | show | jobs
by morelisp 1469 days ago
For LLMs it's not really that interesting. There's no embodiment at all beyond text, there's no sense of "time" or "speed". Either it's running full-tilt generating a response, or it's off.
1 comments

As I said, I think it's interesting if it uses the downtime for something else. (Like a chess player thinking on the opponent's click.)

Talking to itself, speculating about continuations, etc - but not actually outputting them (else it's just a longer response). Instead, storing some parts in the buffer. Perhaps bifurcating. Even better, using some down time to summarise recent material and store the summary in the buffer. That's not a bad description of "thinking".

But then it's not a regular LLM anymore and we don't really know how to build that.

The whole motivating aspect of these models is "attention is all you need" to reduce dependencies from recurrence, which would otherwise halt this obscene scaling in its tracks.

It doesn't need to be trained any differently. We can use the current LAMDA model to do this (if we have real access).

We would put a small non-learning/non-neural interface on top of the system to implement these ideas. That interface could act like this:

* To ask for extra "thinking" text: after the underlying model stops typing, we output that text to the human user. But we then do the equivalent of pressing tab to request more text, and buffer that.

* To summarise some text (eg some of the thinking text from above), we can use another instance. We put it into summarisation mode, eg using the TLDR hack or any other method. Summarised text can be used as a prompt, or as output.

* We can bifurcate by copying instance state and starting a new instance.

These are pretty basic ideas, probably already in use, but I think they show how we can expand the system from a kind of instantaneous stimulus-response to something more interesting.

Hopefully it's clear this is not equivalent to sleep(10). In my view it doesn't make the system more intelligent, rather it allows the system to use its existing abilities more fully.

(edit: another aspect we could control would be switching the system between high-temperature modes and low-temperature modes, in different instances, and depending on what we're trying to achieve. This relates a bit to the "speculation" comment made above by another user.)

Perhaps even occasionally, without being prompted by further input, continuing a prior statement with a further "followup thought" when some specific threshold of "speculation" is reached?
Or just put a sleep(10) halfway through printing the output. That makes it more intelligent, right?