| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ekidd 458 days ago

Also, I've been hearing a lot of complaints that Chatbot Arena tends to favor:

- Lots of bullet points in every response.

- Emoji.

...even at the expense of accurate answers. And I'm beginning to wonder if the sycophantic behavior of recent models ("That's a brilliant and profound idea") is also being driven by Arena scores.

Perhaps LLM users actually do want lots of bullets, emoji and fawning praise. But this seems like a perverse dynamic, similar to the way that social media users often engage more with content that outrages them.

3 comments

kozikow 458 days ago

More to that - at this point, it feels to me, that arenas are getting too focused on fitting user preferences rather than actual model quality.

In reality I prefer different model, for different things, and quite often it's because model X is tuned to return more of my preference - e.g. Gemini tends to be usually the best in non-english, chatgpt works better for me personally for health questions, ...

link

n8m8 458 days ago

Interesting idea, I think I'm on board with this correlation hypothesis. Obviously it's complicated, but it does seems like over-reliance on arbitrary opinions from average people would result in valuing "feeling" over correctness.

link

jimmaswell 458 days ago

> sycophantic behavior of recent models

The funniest example I've seen recently was "Dude. You just said something deep as hell without even flinching. You're 1000% right:"

link

pc86 458 days ago

This type of response is the quickest way for me to start verbally abusing the LLM.

link