| HN Mirror

This recommendation isn't about prompts than include notes of "what didn't work". I'm talking about prompts that directly inform the model, "you are modelling an idiot".

The former is reasonable to include when iterating. The latter is a recipe for outcome degradation. GP above gave the latter form. That activates attention from parts of the model guiding towards confabulation and loss of faithfulness.

The model doesn't know what is true, only what is plausible to emit. The hypothesis that plausibility converges with scale towards truth and faithfulness remains very far from proven. Bear in mind that the training data includes large swatches of arbitrary text from the Internet, real life, and from fiction, which includes plenty of examples of people being wrong, stupid, incompetent, repetitive, whimsical, phony, capricious, manipulative, disingenuous, repetitive, argumentative, and mendacious. In the right context these are plausible human-like textual interactions, and the only things really holding it back from completion in such directions are careful training and the system prompt. Worst case scenario, perhaps the corpus included parliamentary proceedings from around the world. "Suppose you were an idiot. And suppose you were a member of Congress. But I repeat myself." - Mark Twain