Hacker News new | ask | show | jobs
by hboon 4461 days ago
I took a quick look. What he described is essentially a specific case of n-grams (n=2, bigrams).

http://en.wikipedia.org/wiki/N-gram

1 comments

Thanks for the information. So it looks like it's a specific application of an algorithm to vectors of bigrams? The most relevant part of the wikipedia page (I think): http://en.wikipedia.org/wiki/N-gram#n-grams_for_approximate_...

It also appears that the algorithm I linked is actually the Sørensen–Dice index. They have the exact same formula on the wiki page: http://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coef...

Appreciate the heads up. Gave me much better terms to search for. Going to add them to the notes of my gist. I'm on vacation now, so I'll have to do more reading on it over the next few days

Also made a public gist for whoever is interested:

https://gist.github.com/doorhammer/9957864