Hacker News new | ask | show | jobs
by hackernews1134 1465 days ago
Why isn't it simply "asleep" when not questioned? In a "coma" perhaps. One made from our programming / "genetics"?!?!

I don't think it is sentient without more evidence. But just wondering if not "thinking" when not questioned is a good reason it isn't?!?!

1 comments

When it's not predicting a word, it's not doing anything. Literally GPU usage goes to zero.

Still it's interesting! Obviously we have to program the system to stop after a sentence or two so the user can read it. What if we could program it to keep "talking to itself", eg speculating about possible continuations on both sides.

For LLMs it's not really that interesting. There's no embodiment at all beyond text, there's no sense of "time" or "speed". Either it's running full-tilt generating a response, or it's off.
As I said, I think it's interesting if it uses the downtime for something else. (Like a chess player thinking on the opponent's click.)

Talking to itself, speculating about continuations, etc - but not actually outputting them (else it's just a longer response). Instead, storing some parts in the buffer. Perhaps bifurcating. Even better, using some down time to summarise recent material and store the summary in the buffer. That's not a bad description of "thinking".

But then it's not a regular LLM anymore and we don't really know how to build that.

The whole motivating aspect of these models is "attention is all you need" to reduce dependencies from recurrence, which would otherwise halt this obscene scaling in its tracks.

It doesn't need to be trained any differently. We can use the current LAMDA model to do this (if we have real access).

We would put a small non-learning/non-neural interface on top of the system to implement these ideas. That interface could act like this:

* To ask for extra "thinking" text: after the underlying model stops typing, we output that text to the human user. But we then do the equivalent of pressing tab to request more text, and buffer that.

* To summarise some text (eg some of the thinking text from above), we can use another instance. We put it into summarisation mode, eg using the TLDR hack or any other method. Summarised text can be used as a prompt, or as output.

* We can bifurcate by copying instance state and starting a new instance.

These are pretty basic ideas, probably already in use, but I think they show how we can expand the system from a kind of instantaneous stimulus-response to something more interesting.

Hopefully it's clear this is not equivalent to sleep(10). In my view it doesn't make the system more intelligent, rather it allows the system to use its existing abilities more fully.

(edit: another aspect we could control would be switching the system between high-temperature modes and low-temperature modes, in different instances, and depending on what we're trying to achieve. This relates a bit to the "speculation" comment made above by another user.)

Perhaps even occasionally, without being prompted by further input, continuing a prior statement with a further "followup thought" when some specific threshold of "speculation" is reached?
Or just put a sleep(10) halfway through printing the output. That makes it more intelligent, right?
What about the time from the input received to the output? it's a total black box, for what we know it could "think" for 90% of that time and just use the last part of gpu power to produce the output (whatever "think" means in this context)

I don't think it should be reasonable to compare it with human reasoning where we think constantly and our brain operates even while we sleep: it is not a human being after all

Well no, it's not a total black box. We program the computation, ie which matrices are multiplied by which. It doesn't have a choice about this.