| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by godelski 164 days ago

  > That's so meta it applies to everything though.

Fair... but I think you're also over generalizing.

Think about how these models are trained. They are initially trained as text completion machines, right? Then to turn them to chatbots we optimize for human preferential output, given that there is no mathematical metric for "output in the form of a conversation that's natural for humans".

The whole point of LLMs is to follow your instructions. That's how they're trained. An LLM will never laugh at your question, ignore it, or any thing that humans may naturally do unless they are explicitly trained for that response (e.g. safety[0])

So that's where the generalization of the more meta comment breaks down. Humans learning to converse aren't optimizing for for the preference of the person they're talking to. They don't just follow orders, and if we do we call them things like robots or NPCs.

I go to a business advisor because of their expertise and because I have trust in them that they aren't going to butter me up. But if I go to buy a used car that salesman is going to try to get me. The way they do that may in fact be to make me think they aren't buttering me up.

Are they being sycophantic? Possibly. There are "yes men". But generally I'd say no. Sycophancy is on the extreme end, despite many of its features being common and normal. The LLM is trained to be a "yes man" and will always be a "yes man".

  tldr:

  Denpok from Silicon Valley is a sycophant and his sycophancy leads to him feigning non-sycophancy in this scene
  https://www.youtube.com/watch?v=XAeEpbtHDPw

[0] This is also why jailbreaking is not that complicated. Safety mechanisms are more like patches and they're in an unsteady equilibrium. They are explicitly trained to be sycophantic.