Hacker News new | ask | show | jobs
by chiccomagnus 841 days ago
You are both right about chunking, and i think is one of the main challenges. About more intelligent chunking approaches, i think you have to give a try to to preprocess.co It's able to preprocess and chunk PDFs, Office Files, and HTML content. It follows the original document layout considering the content semantics so you get optimal chunks