Just my personal opinion, but when you have an indexed copy of the whole web, a few million OCRed-but-not-corrected books from previous centuries added to your LM are not going to improve 2015 speech recognition quality.
It would illustrate how language and ideas evolve over time. It would illustrate how language and ideas that are from different geographical sources might differ or be similar, especially during pre-Internet periods. It would provide source the material which is being referenced in contemporary works. It would provide many, many other benefits.
Way way more than a corpus of a few million published books, that's for sure. Hell, there are individual message boards that have higher word count than millions of books. Wikipedia arbitration cases (these aren't articles, but rather, an esoteric back channel for handling disputes between users) frequently reach novel-length.
The average quality is going to be lower, of course.
The least interesting thing about Mexican American War is what type of dash you use between Mexican and American. There are over twenty thousand words about that dash on wiki meta.
15,000 words would be okay if at the end of it there was some kind of consensus, or something that could be tramsfered to different articles.
The future people are going to have a skewed image of us if they think meta wiki is representative.