|
|
|
|
|
by gordonhart
118 days ago
|
|
Anecdotally, 4o's sycophancy was higher than any other model I've used. It was aggressively "chat-tuned" to say what it thought the user wanted to hear. The latest crop of frontier models from OpenAI and others seems to have significantly improved on this front — does anybody know of a sycophancy benchmark attempting to quantify this? |
|