Hacker News new | ask | show | jobs
by amitport 1186 days ago
"The way one might phrase a question might be very different content wise from how the document describes the answer."

You have late-interaction models, which replace the dot product with a few transformer layers and are able to learn complex semantics.

Of course this would adversely affect latency and embedding size, so you might want to compress and cache the answers, hence (shameless plug):

https://aclanthology.org/2022.acl-long.457/