Curious: Since Standard Ebooks uses Project Gutenberg's work, why not contribute back instead of 'fork' to a separate project? Are there obstacles preventing this or making it less than desirable?
Our editions are totally different than what PG does, our goals are different, our technical approach is different, and our collections policy is different. We would rather have our own curated catalog on our own website, than be another edition lost among many in PG's huge catalog.
PG does great work and we rely on them almost exclusively for transcriptions. But we're two friends working towards to different goals.
Would it be possible to contribute back the corrections from proofreading so that others could benefit, if not some of the fancier formatting/fonts/etc.? Or is that prohibitively difficult due to what is effectively a one-way conversion from PG to your own format?
PG does great work and we rely on them almost exclusively for transcriptions
Until I got to this part of the comment I was thinking "Yay, an alternative to PG's godawful OCR transcriptions". Why would you reuse the worst part of Project Gutenberg?
It's a starting point is what I think they're getting at. Preclassification which a human then corrects--we're effectively talking about a labor saving device for an otherwise tedious task.
Can confirm. Think of it like using an AI to do an initial pass at a conference transcription and then correcting the typos, rather than doing the whole transcription by hand. Even if it's only 85% accurate, you've still saved a boatload of time.
When I did "The Valley of Fear" as my first project, the PG text was used as the base, but if I encountered any kind of ambiguity in the text, I consulted at least a half-dozen other versions of the text via Google Books scans for agreement.
The team is also very particular about only using editions that have entered into the public domain. So if the first edition of a book just entered public domain, you must make sure that what you have produced only uses text from the first edition, and that you haven't inadvertently used a later edition as a base that may have included subsequent editorial changes.
Hmm, I'm fairly confident a large chunk of this work could be automated (correcting OCR errors). I would be happy to take a shot at this problem as a volunteer, if you're open to the idea?
May I ask why you felt rudeness was appropriate here?
I had read the link and it was not obvious upon reading it why contributing back to Project Gutenberg did not make sense for them. In particular, I did not understand why it would not be desirable to contribute back corrections to the text to the "upstream" and original source so that others could also benefit - I did not see any contradiction between doing so and the goals/benefits stated on their page.
PG does great work and we rely on them almost exclusively for transcriptions. But we're two friends working towards to different goals.