|
|
|
|
|
by chiccomagnus
841 days ago
|
|
You are both right about chunking, and i think is one of the main challenges.
About more intelligent chunking approaches, i think you have to give a try to to preprocess.co
It's able to preprocess and chunk PDFs, Office Files, and HTML content.
It follows the original document layout considering the content semantics so you get optimal chunks |
|