Hacker News new | ask | show | jobs
by bojo 1090 days ago
So, are we saying it's unethical for Google and other search engines who make money off of ad revenue to scrape sites like Twitter? Or are they paying a large sum to Twitter to do this?
5 comments

If Google doesn't provide a way to say "please don't scrape my site", then it 100% unethical.

We have robots.txt. If Google doesn't respect that, it's unethical. Don't you think so?

Does twitter's robots.txt forbid scraping? Judging by the fact it shows up in Google I'd assume not.
Maybe it's time for an llm.txt

Not that the people you want to respect that would

The tricky part is it's much more harder to prove that they didn't respect that.
When there is a value exchange between the two entities that are relatively similar then I think it is ethical. People trade Google making money on ads for their site being found when people search. It is also possible to opt-out.
They benefit mutually from their symbiosis. Financially, AI bro model #1321 doesn’t bring anyone value except their owners.
If done against the wishes of the owner of the site, yes, I would consider that unethical. Thankfully, Google respects robots.txt and noindex.
But it's it ethical for the site owner to block access to random people and companies in the internet to _my_ data? I posted that tweet with the expectation that it's gonna be publicly available. Now the owner of the site is breaking that expectation. I would say that this part is also unethical.

Especially since they're not moderating things or anything.

I would say that this part is also unethical.

Agreed. However, it's probably covered by their terms of service.

Same thing with the recent reddit kerfuffle. I'd have much preferred a Usenet 2.0 instead of centralizing global communications in the hands of a handful of private companies with associated user-hostile incentive structures.

Being indexed by google is optional. Twitter could stop it a any time if they thought it was a bad deal for them. That not comparable to a startup company trying to scrape the entire site to train their AI and using sophisticated techniques to bypass protections Twitter has put in place