| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mckirk 1145 days ago
	The problem is that these models do not have any working memory they could use to carry out such tasks, which are on a meta-level when seen from a language perspective. They can only go with their 'gut instinct' for selecting the next word, they can't 'consider and ponder the problem internally' first.

2 comments

sp332 1145 days ago

The problem is that the input is tokenized before the model gets it as input. It does not see the individual letters "t" + "o". It gets one single token, #1462. The word "toe" is another single token, #44579. Maybe over time it could learn from context that inputs that start with #44579 also satisfy the constraint of starting with #1462, but that's a lot of work and it's not going to happen for all combinations of letters.

link

jameslevy 1145 days ago

Perhaps prompting the model to first describe its approach to answering the question. This type of chain-of-thought technique can yield better results.

link