|
|
|
|
|
by TeMPOraL
342 days ago
|
|
The point is, there is no hard boundary. The LLM too may know[0] following instructions in data isn't part of transcription task, and still decide to do it. -- [0] - In fact I bet it does, in the sense that, doing something like Anthropic did[1], you could observe relevant concepts being activated within the model. This is similar to how it turned out the model is usually aware when it doesn't know the answer to a question. [1] - https://www.anthropic.com/news/tracing-thoughts-language-mod... |
|
If you just ask, the human is not likely to lie but who knows with the LLM.