| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by TeMPOraL 342 days ago

The point is, there is no hard boundary. The LLM too may know[0] following instructions in data isn't part of transcription task, and still decide to do it.

[0] - In fact I bet it does, in the sense that, doing something like Anthropic did[1], you could observe relevant concepts being activated within the model. This is similar to how it turned out the model is usually aware when it doesn't know the answer to a question.

[1] - https://www.anthropic.com/news/tracing-thoughts-language-mod...

1 comments

Dylan16807 342 days ago

If you can measure that in a reliable way then things are fine. Mixup prevented.

If you just ask, the human is not likely to lie but who knows with the LLM.

link