Hacker News new | ask | show | jobs
by mmastrac 1584 days ago
The quality of transcription on Gutenberg is rough, especially for older transcriptions. Standard eBooks are much higher quality, but selection is limited because of the effort gap.

I processed two books from Gutenberg for SE (Devil's Dictionary and a smaller scifi novel) and both were quite a bit of work to bang the books into shape (half the work was metadata enhancement, half was proofreading and correcting)

EDIT: After comparing, it's definitely just the raw Gutenburg scan w/formatting. You can see a big batch of fixed typos that weren't applied here: https://github.com/standardebooks/ambrose-bierce_the-devils-...

1 comments

Perhaps, but Gutenberg is adding a huge amount of good quality content as of late, thanks to their Distributed Proofreaders community. Older content will soon be a small fraction of the total, and much of it will be picked up and updated to current standards.
The DP stuff is better, for sure. The first book I did was DP and it was far less 'buggy' than the older one. The issues were mostly formatting.