|
|
|
|
|
by Oras
565 days ago
|
|
One of the challenges I have with RAG is excluding table of contents, headers/footers and appendices from PDFs. Is there a tool/technique to achieve this? I’m aware that I can use LLMs to do so, or read all pages and find identical text (header/footer), but I want to keep the page number as part of the metadata to ensure better citation on retrieval. |
|