Hacker News new | ask | show | jobs
by adkatrit 4317 days ago
i bet you'd have better luck just removing all spaces and then resorting to a text segmentation algorithm. Peter Norvig has some great papers on how to do this effectively. I've wrapped his code with a Tornado web service: https://github.com/adkatrit/text-segmentation-server