Hacker News new | ask | show | jobs
by visarga 49 days ago
Beautiful idea, an autoencoder must represent everything without hiding if is to recover the original data closely. So it trains a model to verbalize embeddings well. This reveals what we want to know about the model (such as when it thinks it is being tested, or other hidden thoughts).
1 comments

It could just invent its own secret language embedded into English akin to steganography. The explanation would not lose information but would remain uninterpretable by humans