While the words in the articles are public domain, the bits in the files are not. So to free the articles, you have to take the bits, turn them into the words and then encode those into bits again.
What you describe seems to be the "sweat of the brow" doctrine, which was rejected in the US by the 1991 Supreme Court case Feist Publications v. Rural Telephone Service.
> The Copyright Act, ยง 103, allows copyright protection for "compilations", as long as there is some "creative" or "original" act involved in developing the compilation, such as in the selection (deciding which facts to include or exclude), and arrangement (how facts are displayed and in what order). Copyright protection in compilations is limited to the selection and arrangement of facts, not to the facts themselves.
> The Supreme Court decision in Feist Publications, Inc., v. Rural Telephone Service Co. clarified the requirements for copyright in compilations. The Feist case denied copyright protection to a "white pages" phone book (a compilation of telephone numbers, listed alphabetically). In making this ruling, the Supreme Court rejected the "sweat of the brow" doctrine. That is, copyright protection requires creativity, and no amount of hard work ("sweat of the brow") can transform a non-creative list (like an alphabetical listing of phone numbers) into copyrightable subject matter. A mechanical, non-selective collection of facts (e.g., alphabetized phone numbers) cannot be protected by copyright.
The standard for creativity under Feist is extremely low, but a scan of an out-of-copyright work is a mechanical encoding into bits, which is not creative at all, so does not add copyright protection.
Spending time and money on collecting the data is irrelevant to US copyright law.
I don't think digitizing a document has ever been ruled as being transformative enough to warrant its own copyright. Now a sylized photo of a document might.
But I could be wrong. Do you have a source?
Even in the case of Aaron Schwartz, they prosecuted on the grounds that his download bot constituted unlawful entry to their system.
For various reasons, I'm not able to provide a source for this. Suffice to say, just because it seems likely that you would end up winning the case doesn't mean someone with deep pockets won't do everything they can to keep you from doing it.
Does that apply to OPs argument? I don't think so; but just like Aaron Schwartz, lawyers can be quite creative when their clients are willing to pay.
Quoting https://en.wikipedia.org/wiki/Copyright_law_of_the_United_St... :
> The Copyright Act, ยง 103, allows copyright protection for "compilations", as long as there is some "creative" or "original" act involved in developing the compilation, such as in the selection (deciding which facts to include or exclude), and arrangement (how facts are displayed and in what order). Copyright protection in compilations is limited to the selection and arrangement of facts, not to the facts themselves.
> The Supreme Court decision in Feist Publications, Inc., v. Rural Telephone Service Co. clarified the requirements for copyright in compilations. The Feist case denied copyright protection to a "white pages" phone book (a compilation of telephone numbers, listed alphabetically). In making this ruling, the Supreme Court rejected the "sweat of the brow" doctrine. That is, copyright protection requires creativity, and no amount of hard work ("sweat of the brow") can transform a non-creative list (like an alphabetical listing of phone numbers) into copyrightable subject matter. A mechanical, non-selective collection of facts (e.g., alphabetized phone numbers) cannot be protected by copyright.
The standard for creativity under Feist is extremely low, but a scan of an out-of-copyright work is a mechanical encoding into bits, which is not creative at all, so does not add copyright protection.
Spending time and money on collecting the data is irrelevant to US copyright law.