|
|
|
|
|
by Grimblewald
407 days ago
|
|
> Any high-enough dimensional space means the distance between any two vectors tends towards 1 Yes, but, you forget the impact that the attention mechanisms have. While high-dimensional embeddings suffer from concentration of distance, attention mechanisms mitigate this by adaptively weighting relationships between tokens, allowing for task-specific structure to emerge that isn’t purely reliant on geometric distance. If we can effectively "Zero" many of the dimensions in a context sensitive way, suddenly much of this curse of dimensionality stuff simply stops applying. It's obviously not perfect, transformers still struggle with over-smoothing among other issues but I hope the general intent and sentiment of my comment is clear. |
|