Hacker News new | ask | show | jobs
by dcastm 974 days ago
I wonder how much better is this, compared to taking the average ( or some other aggregation) of embeddings with a smaller context length. Has anyone done a similar comparison?
1 comments

The issue with averaging is that over large inputs, it drowns out small signal. For example, there is a chance that it completely loses a reference to something made only in a single sentence somewhere in a large document.