Hacker News new | ask | show | jobs
by IanCal 4655 days ago
Did Weev think that the email addresses didn't count as personal information, and were perfectly fine for anybody to scrape?

> If the server owner had so desired, they could have made the data private by adding a password.

But the server is still just sending data in response to a request, even with a password. The only reason a password is a line we draw is intent. It's hard to say you didn't realise that guessing at someones password was wrong.

1 comments

Then, it seems, a good solution to solve the problem is to have server owner to declare in advance what are intended use and what's not. Accessing information without providing the correct password is certainly unintended use, so is guessing passwords. And accessing knowing the password is definitely the intended mode of operation.

A logical step is to make that machine readable. Oh, wait, suddenly this is getting to the server software and configuration, that server developer/administrator had screwed up.

My question is - why we don't make that logical step and simplify things instead of relying on some "should be common sense" and "you should've known you wasn't supposed to do so" completely-gray-area?

> Then, it seems, a good solution to solve the problem is to have server owner to declare in advance what are intended use and what's not.

You mean like the Terms of Use for the AT&T website?

http://www.att.com/gen/general?pid=11561#14

Sort of, but in machine-readable form and under well-known location (like /robots.txt) so you could read and comply with them before you access the site.

As for those exact terms, I suspect (IANAL) those exact terms prohibit almost any access to the site, as, for example, they forbid any programmatic access to obtain the information, and I haven't heard of any non-software user-agent implementations.

You can translate "programmatic" as "automated" as in "someone coded a program/tool to, in a programmatic way, access the website and retrieve the data"

As opposed to a human being in a non-programmatic way, opening his browser and accessing the website.

What's so hard about it?

> someone coded a program/tool to, in a programmatic way, access the website and retrieve the data

Doesn't, for example, Firefox, perfectly fit this description? Yes, I do manually enter the base URL to access, but if that's the distinctive feature...

> As opposed to a human being in a non-programmatic way, opening his browser and accessing the website.

... then manually typing in ./scrape.py www.att.com is non-programmatic, too. :)

Or, maybe, I'm not getting the correct meaning of "automated" due to bad English comprehension and false analogies from other languages. But I always thought every request on the Internet is automated and done by some kind of hardware+software combo, so forbidding "programmatic" access is complete nonsense (access control and rate-limiting are the proper solutions).

(And, if that matters, author of scrape.py does not need to conform to AT&T's TOS if s/he don't actually use the script by themself.)

Wait: so before accessing a website I have to go read its terms of use?

What if I set up a website, put a clause saying "you agree to pay $50/page view" in there, and hid it away. Google crawlers will find my site in no time, and then I can start raking the dollars in, right?

No: such a clause wouldn't be enforceable in that context.