| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by js8 52 days ago
	A very human thing to do is - not to tell us which model has failed like this! They are not all alike, some are, what I observe, order of magnitude better at this kind of stuff than others. I believe how "neurotypical" (for the lack of a better word) you want model to be is a design choice. (But I also believe model traits such as sycophancy, some hallucinations or moral transgressions can be a side effect of training to be subservient. With humans it is similar, they tend to do these things when they are forced to perform.)

2 comments

nialse 52 days ago

Codex in this case. I didn't even think about mentioning it. I'll update the post if it's actually relevant. Which I guess it is.

EDIT: It's specifically GPT-5.4 High in the Codex harness.

link

anuramat 52 days ago

weird, for me it was too un-human at first, taking everything literally even if it doesn't make sense; I started being more precise with prompting, to the point where it felt like "metaprogramming in english"

claude on the other hand was exactly as described in the article

link

zingar 52 days ago

Also the exact model/version if you haven't already.

link

TZubiri 51 days ago

Also, there's no specific examples of what the prompt was and what the result was. Just a big nothingburger

link