|
|
|
|
|
by dontlikeyoueith
45 days ago
|
|
> The NLA would be forced to use human readable representations to get a successful round trip. That still doesn't guarantee any semantic correspondence between the human readable representation and the model's "thinking". The child's game of "Opposite Day" is a trivial example of encoding internal thoughts in language in a way that does not correspond to the normal meaning of the language. |
|
“We find little evidence of steganography in our NLAs. Meaning-preserving transformations, like shuffling bullet points, paraphrasing, or translating the explanation to French, cause only small drops in FVE, and this gap does not widen over training.”