| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by thomastjeffery 490 days ago

I guess the crux of it is this: is it training or awareness?

What I see happening between the <think> tags of Deepseek-R1 is essentially a premade set of circular prompts. Each of these prompts is useful, because it explores a path of tokens that are likely to match a written instance of logical deduction.

When the <think> continuation rewrites part of a prompt as a truthy assertion, it reaches a sort of fork in the road: to present a story of either acceptance or rejection of that assertion. The path most likely followed depends entirely on how the assertion is phrased (both in the prompt, and in the training corpus). Remember that back in the training corpus, example assertions that look sensible are usually followed by a statement of acceptance, and example assertions that look contradictory or fallacious are usually followed by a statement of rejection.

Because the token generation process follows an implicit branching structure, and because that branching structure is very likely to match a story of logical deduction, the result is likely to be logically coherent. It's even likely to be correct!

The distinction I want to make here is that these branches are not logic. They are literary paths that align to a story, and that story is - to us - a well-formed example of written logical deduction. Whether that story leads to fact or fiction is no more and no less than an accident. We humans often tend to follow a similar process, but we can actively choose to do real critical thinking instead.

This design pattern is really useful for a few reasons:

- it keeps the subjects of the prompt in context

- it presents the subjects of the prompt from different perspectives

- it often stumbles into a result that is equivalent to real critical thinking

On the other hand,

- it may fill the context window with repetitive conversation, and lose track of important content

- it may get caught in a loop that never ends

- it may confidently present a false conclusion to itself, then expand that conclusion into a whole thread

- the false conclusions it presents will be much less obvious, because they will always be written as if they came out of a thorough process of logical deduction

I find that all of these problems are much more likely to occur when using a smaller locally hosted copy of the model than when using the full-sized one that is hosted on chat.deepseek.com. That doesn't mean these are solved by using a bigger model, only that the set of familiar examples is large enough to fit most use cases. The more unique and interesting your conversation is, the less utility these models will have.

1 comments

yetihehe 490 days ago

> We humans often tend to follow a similar process, but we can actively choose to do real critical thinking instead.

> - it may confidently present a false conclusion to itself, then expand that conclusion into a whole thread

I want to know how that differs from human "real critical thinking", because I may be missing this function. How do you know what you thought of is true or false? I only know it because I think I know it. I had made a lot of mistakes in past with a lot of confidence.

> The more unique and interesting your conversation is, the less utility these models will have.

Yeah, that also happens with a lot of people I know.

> ... the result is likely to be logically coherent. It's even likely to be correct!

Yeah, a lot of training data made sure that what it outputs is as correct as possible. I still remember my training over many days and nights to be able to multiply properly, with two different versions of multiplying table and many false results until I got it right.

> I guess the crux of it is this: is it training or awareness?

I don't think LLM's are really aware (yet). But they do indeed follow logical reasoning method, even if not perfect yet.

Just a thought: when do you think about how and what you think (awareness of your thoughts)? When you actually think through a problem, or after that thinking? Maybe to be self-aware, AI's should be given some "free-thinking time". Currently it's "think about this problem and then immediately stop, do not think any more". Currently training data discourages any "out-of-context" thinking, so they don't.

thomastjeffery 489 days ago

We know what true and false mean. An LLM knows what true and false are likely to be surrounded with.

The problem is that expressions of logic are written many ways. Because we are talking about instances of natural language, they are often ambiguous. LLMs do not resolve ambiguity. Instead, they continue it with the most familiar patterns of writing. This works out when two things are true:

1. Everything written so far is constructed in a familiar writing pattern.

2. The familiar writing pattern that follows will not mix up the logic somehow.

The self prompting train of thought LLM pattern is good at keeping its exploration inside these two domains. It starts by attempting to phrase its prompt and context in a particular familiar structure, then continues to rephrase it with a pattern of structures that we expect to work.

Much of the logic we actually write is quite simple. The complexity is in the subjects we logically tie together. We also have some generalized preferences for how conditions, conclusions, etc. are structured around each other. This means we have imperfectly simplified the domain that the train of thought writing pattern is exploring. On top of that, the training corpus may include many instances of unfamiliar logical expressions, each followed by a restatement of that expression in a more familiar/compatible writing style. That can help trim the edge cases, but it isn't perfect.

---

What I'm trying to design is a way to actually resolve ambiguity, and do real logical deduction from there. Because ambiguity cannot be resolved to a single correct result (that's what ambiguity means), my plan is to, each time, use an arbitrary backstory for disambiguation. This way, we could be intentional about the process instead of relying on the statistical familiarity of tokens to choose for us. We would also guarantee that the process itself is logically sound, and fix it where it breaks.