| Project Gutenberg is a complete mess. It took them years of discussion and they haven't yet chosen a master file format for their books.[1]
As a consequence, they are hosting thousands of books in many different file-formats provided by distributed proofreaders[2] and some other sources.
But since no master-format exists and conversion is .. well .. buggy, they have no leverage on their quality problems. The textual content may be fine, but the reading experience for the casual user shows how big the iceberg is under the surface. Having seen endless flamewars[3] and several unsuccessful approaches to their formatting and complexity problems, I propose clear reset. What about a nice fork with only a few important books that get proper treatment in regard to - master file format - conversion (mainly html, pdf and ereader formats) - design and page layout (e.g. for pdf versions) From then on, one could build a growing git repository of _nice_ books and build an infrastructure around that.
Tackling their current mess directly (and taking on the burden of their internal politics and historical toolchain) will be completely in vain, as the past has shown. I don't have much time to spare. But for a really good cause (and the future of bookreading is a really good cause), I am willing to invest. Who is in? [1] This one is the latest candidate: http://www.gutenberg.org/wiki/Gutenberg:RST
Before that, there was one guy working on a simplified TEI format. But that never took off. [2] http://www.pgdp.net/c/ [3] The home of many-a-flamewar: http://blog.gmane.org/gmane.culture.literature.e-books.guten... |