|
|
|
|
|
by sdenton4
281 days ago
|
|
The spare autoencoder work is /exactly/ premised on the kind of near-orthogonality that this article talks about. It's called the 'superposition hypothesis' originally:
https://transformer-circuits.pub/2022/toy_model/index.html The SAE's job is to try to pull apart the sparse nearly-orthogonal 'concepts' from a given embedding vector, by decomposing the dense vector into a sparsely activation over-complete basis. They tend to find that this works well, and even allows matching embedding spaces between different LLMs efficiently. |
|