|
|
|
|
|
by heisenburgzero
973 days ago
|
|
Why does the embeddings have linear properties such that you can use functions like cosine similarity to compare? It seems that after the signal going through so many non-linear activation layers, the linear properties should have been broken down / no guarantees. I wasn't able to find a good answer online. |
|