|
|
|
|
|
by PaulHoule
1226 days ago
|
|
The basic trouble with LLMs is that they have a fixed attention window which is often 512 (BERT) - 4096 (ChatGPT) tokens. If you are handling documents that fit in that window they are magical but once you go outside of what they were trained for they don't really beat BM25 and other classical methods anymore. Certainly larger models will come and people might find ways to make more scalable LLMs but for now you are going to be crunching your documents down to size. It is a path less taken in the industry but there is a methodology for evaluating search engines, see https://github.com/usnistgov/trec_eval You can certainly try using BM25 and decide off the cuff if you like it or not but if you want to try a lot of different things you're going to need a set of documents, queries and evaluated responses ("is this relevant?") I'd imagine you could train a retrieval model based on that kind of data much like they train ChatGPT, it's probably not as hard but would be a substantial project that would need a lot of training data but I bet you could beat cosine similarity on the vectors. |
|