| HN Mirror

> disappointed that the EU seems to be taking a "training on copyright data is a copyright violation" stance

On reading the text, I'm not convinced that they actually are. Copyright of the training data is only mentioned once in the act that I can find, here:

> Any use of copyright protected content requires the authorization of the rightholder concerned unless relevant copyright exceptions and limitations apply. Directive (EU) 2019/790 introduced exceptions and limitations allowing reproductions and extractions of works or other subject matter, for the purposes of text and data mining, under certain conditions.

Initially "Any use of copyright protected content requires the authorization of the rightholder concerned" sounds like a strong anti-scraping stance, but then the "unless relevant copyright exceptions and limitations apply" makes it nothing more than a restatement of how copyright works in general. The question is whether any exceptions/limitations do apply, and the fact that they immediately point to the DSM directive's copyright exception for text and data mining implies they see it as sufficient for machine learning datasets.

The "certain conditions" essentially just means following robots.txt if it's for commercial purposes, which all scrapers I'm aware of already do regardless.