| HN Mirror

There are a lot more failure modes specific to LLMs deriving from their auto-regressive nature. "Enough training data" isn't enough, the training data also needs to include lots of directions for when and how to hedge outputs so that it doesn't dig itself into a hole.

Example query: "list 5 songs where the lyrics start with "hey" but the title doesn't"

It will confidently hallucinate answers where the lyrics do start with hey, but so do the song title. But if you tell it to first output the lyric and then the song title, it will correctly check that both conditions are true before claiming a match. "sufficiently similar training data" wouldn't help in this case, or at least not without making the training data so exhaustive as to be impractical.

This is essentially another kind of CoT prompting which helps these failure modes. It seems difficult to train the models themselves to determine they need a suitable strategy to work around issues like these (as opposed to prompting it to).