Hacker News new | ask | show | jobs
by scarygliders 1690 days ago
> You can think of it this way, the prices and product data are publicly visible already on the website, there are no real secrets, none of it is password protected.

There's the problem right there. The prices and product data are publicy visible - because there is a target audience of /humans/ for whom the site is designed and intended to be used by. The site is not there to cater for a competitor's scrapers.

I don't care how much people couch their unethical behaviour in "the data is publically available", the basic fact is most if not all websites exist for human eyeballs to look at them. They do not exist for arseholes to DOS them by inundating them with scrapers.

4 comments

From my perspective, the problem is that the data that is offered isn't really "for humans". The data is for convincing the humans to buy/pay or worse, browse and watch ads as a result.

But overall, information is one of those goods that has intrinsic properties like no other. It can be copied, infinitely. And we haven't yet figured out the dynamics of how to reason about it, so it feels like we're pretending they're physical goods.

Edit. Side note. I'd go further and say that some of the data is even worse, it's "offered" with the real intention being to confuse the users into performing non-optimally in the market. Look at Amazon/Ebay/AliExpress/Google listings for evidence of that. Just Google - Google is a ML and scraping power house, and the best they can muster is to be spammed with fake websites and duplicate/confusing listings.

You hit the nail on the head. It's hard to have sympathy for site operators complaining about scraping, where almost every site does its best[0] to make using it a time consuming, potentially risky and overall annoying ordeal. Not to mention, information asymmetry is anathema to a well-functioning market, and yet no. 1 reason for fighting bots given in the whole thread here is a desire to maintain that information asymmetry.

And that's also the dirty secret behind the "attention economy": it's whole point is to make things as inefficient as possible, because if you're making money on people's attention, you need to first steal it (by distracting them from what they're trying to achieve), and then either direct towards your goals (vs. those of the users), or stretch it out to maximize their exposure to advertising.

--

[0] - Sometimes unintentionally. Unfortunately, the overall zeitgeist of UX design is heavily influenced by bad players, so default advice in the industry is often already intrinsically user-hostile.

> Not to mention, information asymmetry is anathema to a well-functioning market, and yet no. 1 reason for fighting bots given in the whole thread here is a desire to maintain that information asymmetry.

This is exactly right.

> the basic fact is most if not all websites exist for human eyeballs to look at them.

There's a whole ethical subthread here of websites trying to making the experience for those humans miserable, and taking away the agency necessary to protect oneself from that. A browser is a user agent. So is a screen reader. So is a script one writes to not deal with bullshit fluff, when all one wants is a simple table of products, features and prices.

I agree 100%, but it is a fact of life, and sometimes it's better to just minimize the fuzz and focus on the things that matter.

Your argument is perfectly valid and applies to offline activities as well (what stops a competitor from walking through the aisles of a Walmart or Costco?), but this is a battle that can't be won, there are too many parasitic actors. It is human nature.

Understanding your competitor's pricing is not "parasitic", it's research. Every company I've ever worked for that sells something online scrapes their competitors in some way (whether with bots or with interns).
I would say it's the opposite of parasitic. It's essential to having a well-functioning free market.
> (what stops a competitor from walking through the aisles of a Walmart or Costco?)

That's a significant portion of Nielsen's business model.

Let's not encourage these unethical people to even think of using human eyeballs and manual data entry for their scraping instead of bots. That sounds pretty darn unethical.