| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jastingo 847 days ago
	What are the legitimate (i.e. legal) use cases for a product such as this? I agree with another comment that called this "Abuse as a Service". It seems to me this product's design goal is nothing more than to circumvent measures site owners take to prevent abuse of their site and run a sustainable business.

11 comments

PathfinderBot 847 days ago

What about something like Nitter? Archiving? Adversarial bridging between different platforms? Automation?

How will well-behaved scrapers undermine the sustainability of a business? I guess adblocking is one, but we can already do that with uBlock and that's legal. Or adversarial bridging, but that only serves to boost competition.

In other words, the question is flipped; why would well-behaved (i.e. non-DDoSing) scrapers be illegal?

link

jastingo 846 days ago

I think you're conflating automation and intentional avoidance of bot detection as part of automation. The issue I have is not that this service allows users to automate browsing activities. The issue is that this service deliberately tries to circumvent being detected as automating browser activities because websites are trying to prevent bots. There are LOTS of services that allow users to create automations without disguising themselves. If you are using well-behaved scrapers that respect TOS then you shouldn't have to use a service like this.

Nitter is an example of a service that explicitly disrupts Twitter/X's way to make money. If they can't make money then they can't provide the service, there would be no Twitter/X, and hence no Nitter. Of course they would try to prevent that kind of behavior and it should be obvious why. Resorting to using a service like this in order to continue using Nitter should raise some alarm bells. Sure you can still do it and rationalize it however you want, but you have to acknowledge you're trying to get the value of the service without paying for it.

Perhaps there are cases where there is a dissonance between a website's TOS and how they are blocking bot traffic? That sounds like a valid gripe. Otherwise, I don't buy the argument.

link

PathfinderBot 846 days ago

That's fair enough. I think that falls under similar arguments to adblocking; it's against ToS, and affects the revenues of ad-supported businesses, but it seems like the popular view is to use it regardless.

link

judge2020 847 days ago

Legality isn't the question here. If you want to speak to the legality, anyone circumventing a robots txt that explicitly has your bot's user-agent and 'disallow: *' is unauthorized access (I imagine it's more nuanced for 'user-agent: *'). No website is required to allow anyone to visit and can discriminate against any client or software any way they want.

link

yjftsjthsd-h 847 days ago

> Legality isn't the question here.

The question was literally,

> What are the legitimate (i.e. legal) use cases for a product such as this?

link

tzs 847 days ago

I've got a couple of things I've used browser automation tools for:

• I want to automate (or at least semi-automate) downloading bank statements. I've got ~14 accounts (checking, savings, credit card, IRA, investment, HSA) across 7 financial institutions.

It's tedious to go download statements from all of them manually.

• I want to save stories from FanFiction.net (FFN) for offline reading. FFN's terms allow automation as long as it doesn't operate faster than a human [1].

[1] From their TOS:

> You agree not to use or launch any automated system, including without limitation, "robots," "spiders," or "offline readers," that accesses the Service in a manner that sends more request messages to the FanFiction.Net servers in a given period of time than a human can reasonably produce in the same period by using a conventional on-line web browser.

link

iamacyborg 847 days ago

> I want to automate (or at least semi-automate) downloading bank statements. I've got ~14 accounts (checking, savings, credit card, IRA, investment, HSA) across 7 financial institutions.

Could you not shoot an email to those institutions asking for a copy of the documents?

link

hrrsn 847 days ago

Not OP, but I do the same for ~7 accounts across 5 institutions. There's no need to contact them since you can manually download the statements, but it's a chore if you're doing it frequently. I usually run my script a few times a week.

link

mrmanner 846 days ago

> Could you not shoot an email to those institutions asking for a copy

They’ll respond within a few days, asking me to log into some web portal to prove that I am me, and then we’re back where we started

link

qingcharles 847 days ago

I'm genuinely scraping a certain social network that doesn't have an accessible API to do what I need. My user is logged in and I just automate the logged-in browser to go to the pages and get the data I need into a console so I can get the data I require.

If there was an accessible API to do what I need, I wouldn't do this because scraping sucks. I have to write 100 JavaScript edge cases to handle all the times the host's servers fail in very weird ways. Plus, walking DOMs on these shitty sites with 10,000 nested divs is not fun. GPT helps with this.

It's net-positive for the host though, as I upload a lot of valuable content that their users genuinely like, but it sucks that I have to be sneaky to get the data I need.

link

BoorishBears 847 days ago

I used their previously available bot detection defeat to add an import feature to my website: Users could link to their creation on another site and my site would scrape the publicly available content so they wouldn't need to re-enter all their data

I've used their product many times actually, and I'm shocked on Hacker News of all places no one's thinking of anything besides abuse. How often is it useful to get information from a webpage and apply it in a new context? Then think of how often said webpage is behind a Cloudflare bot detector.

link

daemonw 847 days ago

If it's the user's data, then under GDPR the other site is obligated to provide a way for them to download/transfer it, specifically with this use case in mind.

They are completely in the right to block you though, you're not the owner of that data, you might be breaking their TOS.

link

mrmanner 846 days ago

“ In exercising his or her right to data portability pursuant to paragraph 1, the data subject shall have the right to have the personal data transmitted directly from one controller to another, where technically feasible.”

They’re not necessarily in the right to block you, if you’re the data subject or acting on their behalf.

link

BoorishBears 847 days ago

This is non-sequitur to my comment:

- GDPR doesn't require it be a convenient export. Users want to paste a link on my site, a click a button, and have it magically appear. Not fill out a form, dump their entire account and sift through that.

- I never opined on the validity of blocking bots

- I never opined on if it's breaking their TOS

Abuse implies a harmfulness. Giving users a quick import option from already public data isn't harmful.

link

yjftsjthsd-h 847 days ago

> If it's the user's data, then under GDPR the other site is obligated to provide a way for them to download/transfer it, specifically with this use case in mind.

In Europe, if the company is actually following the law, in theory yes.

> They are completely in the right to block you though, you're not the owner of that data, you might be breaking their TOS.

IANAL, but AIUI that's definitely not true in the United States and I suspect similar ideas hold elsewhere: https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn

link

Klonoar 847 days ago

There are a litany of posts on this very site that detail why HiQ vs LinkedIn is more nuanced than you're making it out to be. HiQ didn't ultimately have the slam dunk win that people think they did.

link

jonatron 847 days ago

You sell jastingo™ brand widgets, but you notice fakes are being sold on eBay, Amazon, AliExpress. You set up a scraper to search for jastingo widgets every day on every marketplace site, but you get blocked. So now you need an unblocker to enforce your copyright/trademark/patent.

link

iamacyborg 847 days ago

Why does it need to be that complicated? If marketplaces are selling fakes, get your lawyer to send them a letter.

link

jonatron 847 days ago

What if you have 10 brands, with 10 products each, and there's 10 marketplaces.

link

iamacyborg 847 days ago

Build a service that helps companies automate sending legal letters to marketplaces.

link

meowface 847 days ago

And that service will very likely be automatically scraping different marketplaces to detect the fake products each time they pop up again.

link

causalmodels 847 days ago

Bots acting on the behalf of users should not be blocked but we have spent several decades treating bots (except for the googlebot) as bad.

Like if I want to programmatically unsubscribe from a subscription, why should I have to do it myself?

link

theamk 847 days ago

That's a bad example, "programmatically unsubscribing" means giving spammers information that this address is alive. A much better solution is to report the unwanted email as SPAM, so the sender's reputation takes a hit.

(and for that 1% of the cases where the address is not a spammer and user knows it, they can just hit "unsubscribe" manually)

link

causalmodels 847 days ago

I’m talking about subscription services a user signed up for at one time

link

vhcr 847 days ago

That's a bad example, there's already the List-Unsubscribe header.

link

causalmodels 846 days ago

Subscription services like Netflix, not emails.

link

codedokode 847 days ago

I think they should introduce request rate limits per IP/domain, for example max 1 parallel request. In this case there will be no significant load, but the data can be scraped.

Scraping is important for example, to monitor competitors' prices to see the opportunity to raise your own prices.

And let's not forget that Google does a lot more scraping than anyone else and has ridiculous profits from it.

link

heipei 847 days ago

Scanning for malicious and phishing websites. These types of sites are just enjoying the ease of free services like Cloudflare to block automated analysis tools and tailor their phishing campaigns to very specific geographical locations and user groups.

link

acaloiar 847 days ago

I'm not a customer, but I have a use case that in my opinion should be legal.

For years I've used my own terminal UI player (di-tui) for di.fm. At some point in the not-so-recent past, di.fm added Cloudflare's WAF, which prevents me from using one of my app's features: managing channel favorites within the app.

To be clear, I'm a paying di.fm customer, and my app only works for paying customers. But now my preferred method of listening to di.fm is slightly hamstrung because Cloudflare's WAF sits between me and little string token available to every browser that accesses di.fm (even non-paying customers).

link

Szpadel 847 days ago

for stuff I use similar self-hosted solution: detecting when kid lessons are available on local portal. but to be fair cheapest option here ($200) isn't usable for non-business usage

Ps: context why I need automation for such thing: those lessons are really popular and are announced at unpredictable time / there might be another spot when someone resigns

link

mrmanner 846 days ago

> What are the legitimate (i.e. legal) use cases for a product such as this?

Data portability! Tools like this can be used to allow individuals to export their data from hostile web services trying to hold it hostage.

Legal in the EU, with GDPR.

link