| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by cadamsdotcom 47 days ago

> An early version of Claude Opus 4.6 would sometimes mysteriously respond to English queries in other languages. NLAs helped Anthropic researchers discover training data that caused this.

Very cool - sounds similar to OpenAI’s goblin troubles.

https://openai.com/index/where-the-goblins-came-from/

1 comments

Destructotor 46 days ago

I'm not sure the cause was really similar. In the case of language switching, it was caused by malformed supervised training data where the prompt was translated, but the answer was kept in the original language. In the case of goblins, it was due to a biased RL reward model.

link