Hacker News new | ask | show | jobs
by pbh 5424 days ago
I hope readers aren't getting the impression from this article that the code examples provided are the correct way to do word segmentation in English. (Though I understand this is an article about interviewing and not about word segmentation. And this might be considered a preprocessing step for doing things correctly...)

Norvig gives a very approachable version of English word segmentation that uses a language model below.

http://norvig.com/ngrams/

1 comments

Peter actually emailed me directly, though I was already very familiar with his work on this and similar problems. But yes, I make it very clear in the post (and to candidates) that they should not assume an English -- or even a natural language -- dictionary.