>Per a report by 404 Media, Daniel van Strien, a machine learning librarian at AI firm Hugging Face, pulled 1 million public posts from Bluesky via its Firehose API for machine learning research, pushing the dataset to a public repository. Van Strien later removed the data due to the controversy that ensued; however, it serves as a timely reminder that everything you post publicly to Bluesky is, well, public.
See also the comments where Daniel apologizes. Bluesky hasnt changed anything about the shape of Twitter and the kind of harassment that occurs. A mob of people have some vague enemy (AI bros) and pile on the first target of opportunity directing all their "kill yourself" energy at one individual at a time.
> hey man i don't know if anyone's told you yet but nobody likes you, you stink, your ideas are stupid, ai is loser shit, and you should go fuck yourself
> Get off this website, you fucking ghouls.
> Just go back to Twitter you Ai trash. Create your own work and stop trying to make money off of everyone else’s. No one wants you here.
> You're not sorry, you lying sack of shit. Get the fuck off this site. Better yet, get off the internet entirely, go live in the mountains and be a hermit for the next 50 years, you rancid fucking malignance.
> I’m using generative AI to make a rendering of you in a mass grave.
etc.
Keep in mind this the post where he takes it down.
Bluesky themselves have made it abundantly clear they're with the artists on this one. The firehose isn't a license to use someone's art for commercial software development.
From the article:
>Per a report by 404 Media, Daniel van Strien, a machine learning librarian at AI firm Hugging Face, pulled 1 million public posts from Bluesky via its Firehose API for machine learning research, pushing the dataset to a public repository. Van Strien later removed the data due to the controversy that ensued; however, it serves as a timely reminder that everything you post publicly to Bluesky is, well, public.