Hacker News new | ask | show | jobs
by cckolon 429 days ago
This would be useful when chunking PDFs scanned with OCR. I've done that before and paragraph breaks were detected pretty inconsistently.