| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by danielbarla 4537 days ago
	That's one way of looking at it, on the other hand, they link to the original URL, passing traffic back to the original source. Most "scraper" sites take the content, wrap it in their own similar outer layer, and try to take ad revenue. E.g. I've seen my own StackOverflow answers copied, word for word, to a scraper site and presented under a made-up name.

3 comments

dangrossman 4537 days ago

StackOverflow actually allows this; all their data is Creative Commons licensed, and they publish the full database dump on the Internet Archive.

https://archive.org/details/stackexchange

link

jbinto 4537 days ago

Do the terms of the license allow for this kind of abuse?

Just because something is CC doesn't mean you can do whatever you want with it.

link

dangrossman 4537 days ago

Yes, they do; it's not abuse when you're given explicit permission. CC BY-SA means you can do whatever you want with it as long as you attribute the source as specified.

link

leephillips 4537 days ago

"as long as you attribute the source"

danielbarla said that they presented the material under a false name; this goes beyond copying and becomes plagiarism, which I can't imagine is an intended result of the CC license.

link

aroch 4536 days ago

Is the source 'User X' or 'StackOverflow'? When you reference CC BY-SA code you don't reference the people who, say, checked it into git but rather the whole repo.

link

Flimm 4536 days ago

CC BY-SA is short for Creative Commons Attribution Share-Alike. BY means you must attribute, and SA means you must license any distributed derivative works under the same license (copyleft). Attribution on its own is not enough.

link

grey-area 4536 days ago

No, attribution is required.

link

jliptzin 4536 days ago

Interesting, from the file sizes you can quickly gauge the relative popularity of each subject.

link

tobehonest 4537 days ago

By having a tl;dr about the actual Wikipedia page, there is no need for the user to click on the link. Following what you're saying, Google as wrapped it in their own layer, and trying to take ad revenue.

link

smoyer 4537 days ago

Actually, I find that having a tl;dr will rarely answer the question(s) I have on a topic, but it will commonly show me whether I've found the right wikipedia page. I usually either click-through or refine my search.

link

bushido 4536 days ago

They don't actually link to the wikipedia URL. They mask a link that leads to another Google page "/url?sa=t&rct=j&q=&...." which in turn responds with a 200 OK page that redirects to Wikipedia.

Sure it passes the keywords etc. But this likely reduces the number of people visiting Wikipedia, while increasing Google's ad revenues, if anyone but Google did this they'd be potential blacklisted by Google.

link

VikingCoder 4536 days ago

Actually, they do link to the wikipedia URL.

href="http://en.wikipedia.org/wiki/Scraper_site" appears directly in the source code of that web page.

It also has an onmousedown handler that rewrites the URL to point at Google, so they can tell which link you clicked, to improve their ranking system. And Google works very closely with sites to make sure the sites know how to understand the referrals.

link