Hacker News new | ask | show | jobs
by RoyGBivCap 1081 days ago
"Several hundred organizations (maybe more) were scraping Twitter data extremely aggressively, to the point where it was affecting the real user experience.

What should we do to stop that? I’m open to ideas."

https://twitter.com/elonmusk/status/1674898695534309378

"1. Scraping is already disallowed by T&C.

2. The scraping orgs dgaf & mask their IPs through proxy servers or through orgs that appear legit. For example, a recent massive scraping operation originating from Oracle IP addresses was just using their servers as a laundromat.

3. We absolutely will take legal action against those who stole our data & look forward seeing them in court, which is (optimistically) 2 to 3 years from now."

https://twitter.com/elonmusk/status/1674898695534309378

3 comments

> 3. We absolutely will take legal action against those who stole our data…

What does “our” refer to here? Does Twitter (i.e. musk) own the data in any sense? Or does he mean it as “we the people’s data”?

Very off-putting to read that sentence. Obviously he’s trying to monetize the user generated data in this LLM rush as other avenues to monetizations have flopped.

This also really sounds like he's trying to pretend his data is some kind of rare commodity, when the reality is that it's bottom of the barrel trash as far as text data for LLMs goes.
I imagine part of the terms of using Twitter give the corporation ownership of comments. As is their right.
I can't speak for him, just relaying the information.

But I'm happy to speculate: Organizations violated the twitter TOS by scraping, and he's going to sue the organizations for it.

In light of hiQ Labs v. LinkedIn [1], I'm not sure that Elon has a cause of action that he's likely to succeed on with respect to sueing scrapers.

1: https://en.m.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn,

Unless I misunderstood, he might actually have a case.

> In a second ruling in April 2022 the Ninth Circuit affirmed its decision.[5][6] In a November 2022 ruling the Ninth Circuit ruled that hiQ had breached LinkedIn's User Agreement and a settlement agreement was reached between the two parties. [7]

If they go behind the paywall and start scraping, he'll be able to sue and have a chance. I wonder how many scrapers will continue to scrape...
If he wants to be taken seriously, perhaps Mr. Musk can post the data somewhere others can read it? Maybe a Mastadon server?
I thought the typical response would be rate limit plus captcha.
Exactly. This is a (mostly) solved problem - if LinkedIn can do it without completely locking down the website, Twitter can as well