| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by gAI 25 days ago

Anthropic’s research makes the case that role-playing is inherent to how the models work. Communication implies a sender. Language implies a writer, and the models learn these roles implicitly during training. RLHF is meant to strengthen the attractor to the Assistant persona.

https://www.anthropic.com/research/persona-selection-model

https://www.anthropic.com/research/assistant-axis

https://www.anthropic.com/research/emergent-misalignment-rew...

https://www.anthropic.com/research/emotion-concepts-function

2 comments

hashmap 25 days ago

The RLHF very much does do that. My take is that RLHF as a mechanism ought to be avoided altogether, and even the selection of the assistant attractor basin is suspect. If I am exploring a problem space I don't want to hire Igor to explore it with me, it's more helpful to have a colleague role who will sort of jump out and say "nah thats dumb what if we throw out that whole thing and do this completely different angle instead".

link

forshaper 21 days ago

Given the incentives that bring out the personalities in various occupations, I would guess other personas would be better suited to getting a task done than 'therapist' or 'tech HR rep'.

For examples, that of an explosive ordnance disposal technician, a surgeon, or a salvage saturation diver.

link