| > We work immensely had to product original content as well as link to original sources when deserved...this case was absolutely no different. Zee, let me tell you something. When you lie, do not lie in a way that can be disproven by math. I mean, it was easy, so easy for me to take five minutes to plug in the article text to a difference calculator and find the result. And here's the outcome: There is a 58.632778264680105% difference between Unwieldy's article and TheNextWeb's (41.367221735319895.% similar).
If you remove the intro from TheNextWeb's article, there is a 28.213166144200624% difference between Unwieldy's article and TheNextWeb's (71.78683385579937.% similar).
Do you want to say the words "original content" again? Because I just determined that your article is at least 41% similar to the one you copied. If we remove the cute introduction, the sameness of your article jumps to over 70% . I used the well-known algorithm called Levenshtein distance. It took me a minute to figure out how I would determine how much TNW's article was plagiarized. And because I do not copy, I will even show you how I got it.First, here are the articles I compared (text only, line breaks removed): http://notes.unwieldy.net/post/23049725899/plagiarism and http://thenextweb.com/shareables/2012/05/14/how-3-simple-but... Here's the code I used: http://ideone.com/BdNk2 (Java) Here's the code I used to calculate the Levenshtein distance: http://en.wikibooks.org/wiki/Algorithm_Implementation/String... And here's the technique I used to calculate the Levenshtein difference percentage (thanks to Alex Martelli): http://stackoverflow.com/questions/3106994/algorithm-to-calc... Now, what could you have done? You could actually admit there was no way you could produce "original content" from copying the original article unless you did actual research beyond what Joshua Gross found. You could merely post a link to the article and say, "This is cool. Check this out." And third, you could be nice on Twitter to the author you shamelessly ripped off. My god, when I can show 41% of your article to be the same as another, the least you should do is be 41% classy about it. |
Nice work on the distance calculation, I think you've just figured out a way to create a blogspam detector, if an article is linked from a newer article and there is a > X% (with X somewhere in the neighbourhood of 45%) or so similarity then it is blogspam.