|
|
|
|
|
by phren0logy
461 days ago
|
|
That's a real issue, but that's masking some of the issues further downstream, like chunking and other context-related problems. There are some clever proposals to make this work, including some of the stuff from Anthropic and Jina. But as far as I can tell, these haven't been tested thoroughly because everyone is hung up at the OCR step (as you identified). |
|
I'm not sure there's a way to get what a lot of people want RAG to be without actually training the model on all of your data, so they can "chat with it" similar to how you can ask ChatGPT about random facts about almost any publicly available information. But I'm not an expert.