Hacker News new | ask | show | jobs
by jacquesm 5145 days ago
And, to boot is still in damage control mode, now that pretending ignorance does not work we're going for the 'unfortunate timing angle'.

Nice work on the distance calculation, I think you've just figured out a way to create a blogspam detector, if an article is linked from a newer article and there is a > X% (with X somewhere in the neighbourhood of 45%) or so similarity then it is blogspam.

1 comments

Thanks, but the distance calculation is the work of Alex Martelli from Stack Overflow. It's one of the sources I cited: http://stackoverflow.com/questions/3106994/algorithm-to-calc... I cited. It's simple enough that I probably could have thought of it on my own if I spent more time on it, but then again it was simple enough to find with a Google search.
I could have sworn it was the work of Vladimir Levenshtein: http://en.wikipedia.org/wiki/Vladimir_Levenshtein
The code I referenced measures the difference between strings (percentages), using Levenshtein distance--which states the number of changes between two strings. If you can find a source that states this idea of difference can be attributed to Levenshtein, then by all means I will acknowledge him. Until then, I will refer to Alex Martelli's code.
The code I referenced measures the difference between strings

Exactly, and the algorithm used is, as I said, attributed to Levenshtein. Expressing it as a ratio is hardly novel.

As for the implementation, Alex Martelli credit's Stavros Korokithakis[1], although Lev implementations are 2-a-penny, and this isn't a particularly good one (sorry Stavros).

[1]http://www.korokithakis.net/posts/finding-the-levenshtein-di...

> Exactly, and the algorithm used is, as I said, attributed to Levenshtein.

I cited Levenshtein by name in my original comment. I'm guessing you didn't read everything what I wrote, because I don't understand why you would think there's an issue otherwise.

> Expressing it as a ratio is hardly novel.

The whole reason I linked to Alex Martelli's post is because it's his work, not mine, novel or otherwise. I just cited the resourced I used.