Hacker News new | ask | show | jobs
by crucialfelix 1108 days ago
The unstructured package works well to partition text, markdown, html, even pdf on structural boundaries like paragraphs, h, hr etc

https://unstructured-io.github.io/unstructured/bricks.html#p...