Hacker News new | ask | show | jobs
by crazygringo 2518 days ago
I've done a lot of work with n-grams in NLP and in my experience it's only useful up to 4-6 words at a time, unless you're trying to index proverbs.

The reason being that grammar is intensely hierarchical, so that mere "linear" processing that n-grams do stops being useful beyond things like compound words or short sayings like "don't mind if I do!"

1 comments

Oh, I totally agree with everything you've said! Some collaborators I'm chatting with have more niche NLP applications where larger n would be valuable though. I don't want to go into too much detail yet since it's their project idea, and not sure on their comfort level on blasting it out to the public yet :)