|
|
|
|
|
by xfalcox
294 days ago
|
|
Depends on your needs. You surely don't want 32k long chunks for doing the standard RAG pipeline, that's for sure. My use case is basically a recommendation engine, where retrieve a list of similar forum topics based on the current read one. As with dynamic user generated content, it can vary from 10 to 100k tokens. Ideally I would generate embeddings from an LLM generated summary, but that would increase inference costs considerably at the scale I'm applying it. Having a larger possible context out of the box just made a simple swap of embeddeding models increase quality of recommendations greatly. |
|