|
|
|
|
|
by freshhawk
4933 days ago
|
|
1. That doesn't address search engines, which are doing it to profit from someone else's work. If you open the door for search engines then how many search engine like things do you give passes to? 2. What if I'm scraping it just for me, because I want a different interface? How many friends can I share that with? Can I open source the program? 3. What if I read a bunch of these sites to do research and write up a story on something about it? Not plagiarizing, just summarizing and providing analysis on craigslist rental prices? What if I do this every day? What if I automate that process? The data is transformed just as much as if I had read it myself and crunched the numbers myself, I made just as many requests to the site as my browser would have. Concepts that have been around a thousand years or more are not fully applicable. Like the printing press, some things alter the scarcity equation for ideas and data distribution and ownership. Considering how little we've agreed on about print after 500 years I have some doubts that this is as closed an issue as you say. |
|
- Respect robots.txt (as mentioned elsewhere) which will often provide a limited subset of all data available
- Give something in return (potential traffic) for the data they reap.
I fully agree that scraping is great, and do it myself frequently. Site operators do have legitimate concerns in some situations though, and it probably comes from feeling as if they are being 'ripped off' somehow.
No one in their right mind is going to object to incidental scraping for personal use.
However, scraping is often scripted into cron or the like and that data is then used to profit someone else. I'm usually cool with that, but if someone is running a web site and they are dependent upon ad revenue to keep the servers running, I understand objecting to it.