Hacker News new | ask | show | jobs
by mlsu 833 days ago
Interjecting requires planning ahead.

The way a human interjects is that you have a parallel thought chain going, along with the conversation, as it's happening in real time. In this parallel chain, you are planning ahead. What point am I going to make once we are past this point of conversation? What is the implication of what is being discussed here? (You also are thinking about what the other person is thinking; you are developing a mental model of their thought process).

LLM does not have any of this, architecturally, it just has the text itself. Any planning that people are claiming to do with LLama et al is really just "pseudo" planning, not the fundamental planning we talk about here. I suspect it will be a while yet before we have "natural" interjection from LLM.

When it does come, however, it will be extremely exciting. Because it will mean that we have cracked planning and made the AI far more agentic than it is now. I would love to be proven wrong.

3 comments

Take this with a grain of salt because I'm not super well read on llms, but isn't their entire function built on prediction?

Sounds like a reasonable approach could be to have a separate "channel" which focuses entirely on the concept of "where is this conversation going?" could give a pretty good baseline for when and how to interject.

We don't have a model for "Where the conversation is going," we have a model for "What's the next token" which implicitly models "Where is the conversation going."

The difference is significant here, because direct manipulation the implicit modeling task is required to do the type of planning that I've described.

It's the same reason these LLM are not "agents." It's because you can only manipulate their world model through the interface of tokens.

> LLM does not have any of this, architecturally, it just has the text itself.

I feel like you are maybe being a bit too focused on specifics of how the LLM works where as:

> The way a human interjects is that you have a parallel thought chain going

You are more abstract in the human case.

They really don’t need to be different here. The LLM could be running predictions in parallel each time you type another token playing out where the conversation is going. You could then layer on another model which blends these together (vaguely like MoE works) and is trained on opportune times to interject. Think of it like a chess playing AI, but rather with the goal of interjecting appropriately vs Checkmate.

The amount of compute power to run all these inferences at once would be fairly expensive, but it’s technically all possible today and wouldn’t be that much different than the human case for this specific scenario imho.

Running predictions in parallel is just doing prediction and we're back at square one. Why do things in parallel in that case? At that point, you are just training an "opportune injection model" with the existing token stream as it comes. Which is subject to exactly the limitation that I described.

These models do have an implicit model of thought, but it is only accessible through the token interface. You need more explicit access, which is not possible given the current architecture.

I'd like to be wrong here.

Writing this out made me think immediately of speculative execution.

Interjection, similarly, saves "conversation cycles," by speculating about the future of a conversation and computing a response which occurs in the most likely branch.

When the branching point comes, that's the interjection. It's either successful (moves the conversation forward) or fails (wastes time when the branch is not predicted properly).