Hacker News new | ask | show | jobs
by receptor 3057 days ago
This is borderline criminal. Practically CSRF attack.
3 comments

You might be able to argue that, though you are arguing against accepted practise (are you are wanting to ban all web crawling?).

While two wrongs don't make a right, assuming we accept that facebook is wrong in this instance which I don;t think I do, the code for the page handing out sensitive information to an unauthenticated request or taking action based on malformed inputs is negligent.

"Information wants to be free" is not just a hippie ideal it is a technical warning. Unless you take proper measures to control and protect sensitive data it will find a way out.

No it's not. It's common place for other websites to crawl you.

Just add a robots file or block the user agent with your firewall.

This sounds a little extreme at first, but I actually totally agree. It's in murky waters when it comes to GDPR, for starters.

Where do they draw the line? Why not run a keylogger through embedded like buttons and widgets? That sounds worse, but isn't all that much worse.

> It's in murky waters when it comes to GDPR, for starters.

I'm not sure about from facebook's side, but from the point of view of how GDPR applies to the side being crawled if they, as custodians of PII and other sensitive data, are handing it out to unauthenticated requests, they might be liable for punishment for lack of due diligence.

I agree with this. The website author is potentially liable for providing inadequate protections to the user's PII. I don't see anything that would implicate Facebook here.

Although, there is an interesting side effect that applies to all crawlers in that website owners failing to protect their customer PII like this means that crawlers inadvertently gather and store personal data as a side effect. I can't help but wonder if there is some liability there and if there is if something like AI or pattern matching can help to scrub the info before it is stored.

facebook might have an issue with having collected the data too, of course, but the source site definitely should be taking appropriate measures to avoid handing it out in the first place.