| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by lurk2 64 days ago

They are almost certainly being financed by the AI lobby as they have been open about providing API access to companies training AI in exchange for “donations.”[1][2][3] Having all of this data available online for free gives those looking for training data plausible deniability. It would turn into a huge legal headache if OpenAI had scraped Spotify directly, but if they launder it through a third party they can at least try to argue they weren’t responsible for the infringement.

Spotify got started doing the same thing, though.[4]

[1]: https://annas-archive.gl/blog/llms-txt.html (“Making an enterprise-level donation will get you fast SFTP access to all the files, which is faster than torrents.”)

[2]: https://annas-archive.gl/blog/duxiu-exclusive.html (“We’re looking for some company or institution to help us with OCR and text extraction for a massive collection we acquired, in exchange for exclusive early access. After the embargo period, we will of course release the entire collection.”)

[3]: https://annas-archive.gl/donate (“Enterprise-level donation or exchange for new collections (e.g. new scans, OCR’ed datasets). […] We welcome large donations from wealthy individuals or institutions. For donations over $5,000, please contact us directly at Contact email.”)

[4]: https://torrentfreak.com/spotifys-beta-used-pirate-mp3-files... (“Rumors that early versions of Spotify used ‘pirate’ MP3s have been floating around the Internet for years. People who had access to the service in the beginning later reported downloading tracks that contained ‘Scene’ labeling, tags, and formats, which are the tell-tale signs that content hadn’t been obtained officially.”)