| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by vonklaus 3588 days ago

The current protocols promote data exchange and since websites are primarily designed to be consumed, there is really no way to stop automated requests. Even companies like distilli[1] networks that parse inflight requests have trouble stopping any sufficiently motivated outfit.

I think data should be disseminated and free info exchange is great. If possible, devs should respect website owners as much as possible; although in my experience people seem to be more willing to rip off large "faceless" sites rather than mom&&pops. Both because that is where valuable data is, and it seems more justifiable even if morally gray.

Regardless, the thing I find most interesting is that Google is most often criticized for selling user data/out their users privacy. However, it is oft not mentioned that Googlebot & the army of chrome browsers are not only permitted, but encouraged to crawl all sites except a scant few that gave achieved escape velocity. Sites that wish to protect their data must disallow and forcibly stop most crawlers except google, otherwise they will be unranked. This creates an odd dichotomy where not only does google retain massive leverage, but another search engine or aggregator has more hurdles and less resources to compete.

[1] They protect crunchbase and many media companies.