Hacker News new | ask | show | jobs
by navigate8310 717 days ago
The author talks about agency which require being able to independently take actions apart from reacting to an input. However, the feedback provided by a two-input model also limits the mind model as it now reacts to the feedback it receives when in listening mode. Isn't it contradictory to the concept if agency?
1 comments

The idea of "agency" I have in mind is simply the option to take action at any point in time.

I think the contradiction you see is that the model would have to form a completion to the external input it receives. I'm suggesting that the model would have many inputs: one would be the typical input stream, just as LLMs see, but another would be its own internal recent vectors, akin to a recent stream of thought. A "mode" is not built in to the model; at each token point, it can output whatever vector it wants, and one choice is to output the special "<listening>" token, which means it's not talking. So the "mode" idea is a hoped-for emergent behavior.

Some more details on using two input streams:

All of the input vectors (internal + external), taken together, are available to work with. It may help to think in terms of the typical transformer architecture, where tokens mostly become a set of vectors, and the original order of the words are attached as positional information. In other words, transformers don't really see a list of words, but a set of vectors, and the position info of each token becomes a tag attached to each vector.

So it's not so hard to merge together two input streams. They can become one big set of vectors, still tagged with position information, but now also tagged as either "internal" or "external" for the source.