| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nighthawk454 517 days ago

Usually, we destructively compress (mean-pooling) both the query and the document, and then compare the two compressed forms.

With ColBERT, we compare first - at the full token level - for more detailed comparison. Then reduce the full set of comparisons to a single vector. Naturally this takes more memory and compute to do the more comparisons. The idea is it’s worth it because the more detailed comparisons lead to better results

tokens —> reduced vector —> comparison

tokens —> comparisons —> reduced vector

2 comments

kroolik 517 days ago

Why do you need the vector if you have already compared the query with the result candidate?

link

nighthawk454 517 days ago

Sorry, you’re right, it pools again to a single comparison scalar in the end

link

lysecret 517 days ago

That’s actually a very good explanation thanks!

link