Hacker News new | ask | show | jobs
by svat 27 days ago
Have you considered having a detailed version history for each book (etext)? The process of submitting fixes to typos etc in books involves sending an email (https://www.gutenberg.org/help/errata.html) and although the last time I did this (2011) the fixes did get applied reasonably quickly (couple of days), it all felt a bit opaque. The version history could also include the project (usually PGDP correct?) the etext originated from; that way one would be able to compare against the actual page scans.

I have very mixed feelings about Standard Ebooks and would much prefer being able to use Project Gutenberg directly, but one good thing Standard Ebooks does is that every book has an associated git repository (on GitHub), so it's (in principle) possible to see a history of fixes to the text over time.

4 comments

We're using git repos internally to keep history for each book. They existed on github for a while, but our implementation was awkward, and too big of project for the volunteer dev team. But it's likely that we'll evolve towards that.
> I have very mixed feelings about Standard Ebooks[…]

Why?

I was hoping to reply to this in detail but as I never got around to it, I'll keep it short: mostly it's about the editorial changes they make to the text, modernizing spelling etc. Many of the changes are unjustified IMO, and often detract from the charm of the original, and I'm uncomfortable reading a text I know has been tampered with in this way. Of course it's their project and they can do whatever they want, and they clearly love books, so with strong opinions there will be some that I may disagree with. I'd much rather read books from Project Gutenberg or Wikisource, both of which don't even correct obvious typos without marking up in some way that they've done so.

I also have many positive things to say about Standard Ebooks, but I don't think you were asking about those. :)

----

Edit: Without going into what I think are the most egregious sort of changes they introduce (which I think will require a longer post) and limiting myself to ones easy to find immediately:

See the earlier discussion (linked in a sibling comment here) where the editor-in-chief says it's ok to change punctuation because "The sounds out of his mouth do not include an apostrophe whether it's there in the spelling or not." (a very American view IMO): https://news.ycombinator.com/item?id=16956931

And looking at a recent commit on one of their books, here's a recent (https://github.com/standardebooks/agatha-christie_the-secret...) revert of one of their aggressive "modernizations" from 2024 (https://github.com/standardebooks/agatha-christie_the-secret...), that had, in line with their usual practice, changed "every one" to "everyone" (in one place even when referring to "a good many risks"), and the same commit made other changes (including one still present) like "they ought to have it lithographed. It must be a frightful nuisance doing every one separately." having the last four words turned into "doing everyone separately."!

On the “every one” example, that’s a definite mistake that shouldn’t have made its way in to the book in the first place. The production process has a specific step for “every one” (https://standardebooks.org/contribute/producing-an-ebook-ste...) that guides producers through making the correct choices when modern usage has two different possible choices. It shouldn’t have happened, but it’s a mistake that was fixed at least.
Your comment makes it sound as though the mistake was introduced by an inexperienced contributor who did not read the guide, when in fact it was introduced by the founder/editor-in-chief of the project. :) And in case it wasn't clear, only one of the mistakes was reverted, and the other one I quoted is still present in the book even as of this moment.

More broadly, the position of Standard Ebooks is that a modern reader would be distracted by spellings like "some one" and "every thing", and by time written like "2.30" instead of "2:30", and that books in British quotation style must be converted to American quotation style. I think most readers can in fact tolerate such small differences, and this position is frankly insulting — the punctuation and spelling of works are part of their character, and if anything, I'm more distracted by such anachronisms in style introduced as part of the Standard Ebooks process.

And to be honest, that position is totally reasonable, and the good thing is that you have the option of Gutenberg, Faded Page, and a bunch of other archival sites, also for free, if you don’t want that.

But nearly all print publishers also do what SE does. Why do you think they do, when it costs additional money and time to do that? A reasonable answer is that some, or a majority of, people prefer it.

> But nearly all print publishers also do what SE does.

Do they? To check, I tried to find a recent publication of Agatha Christie, and found the collection “Country Christie: Twelve Devonshire Mysteries” which says “First published by HarperCollins Publishers Ltd 2025”. It still has British-style punctuation (throughout the book), and times like “1.30”, “9.30”, “11.30”, “7.30 a.m.”, “12.30 p.m.”, and “8.30”. I checked a couple of other recent publications and admittedly they do modernize (though not in phrases like “every one of you”), but again I found the collection “The Last Seance: Haunting Tales from the Queen of Mystery” (2019) which does not. So it seems mixed.

In any case, I think it's fine to do what Standard Ebooks does, and if it were instead called something like “Modernized Ebooks with American punctuation”—if readers would know before picking one up—it would be totally unobjectionable. The name “Standard” gives the wrong impression. It's a bit like colorizing old black-and-white movies (or dubbing foreign-language movies instead of subtitling them): yes possibly even a majority of people may prefer it, but IMO it would be good to be more explicit what has been done.

It splits the community and number of possible volunteer hours for one. It also splits the canon into different versions. More projects fight for the attention attention (and possibly donations) of the audience.

There are lots of reasons it could be preferable to centralize. OTOH their mission is limited and some competition is healthy, if only to explore alternative ways to do things.

It’s a different mission.

PG focuses on an accurate digital translation of the source material, sometimes hosting multiple different versions of the same text, and doing things like putting work into recreating the adverts at the back of some novels.

SE focuses less of preservation and more on making readers’ versions of the texts, like other publishing imprints. So there’s typography standardisation, a light-touch moderinisation of hyphenation and soundalike spelling, and things like author-wide collections of short fiction and poetry even if it didn’t previously exist.

Both are valuable, but they serve different segments.

Not the GP, but I also have mixed feelings about Standard Ebooks. They modernise texts for American readers. This means changing the punctuation, merging some words, altering the syntax, etc.

When I read an old novel, written two centuries ago in England, the little differences to modern English are part of the charm, and I certainly don't want any Americanism mixed in. For one of my favorite novels, The Forsyte saga, the author deliberately used some rare forms of words, which SE replaced with the mainstream forms.

SE editor in chief here. What you describe is incorrect. The only thing we do is very light sound-alike spelling modernization, like "to-night" -> "tonight". We do not do things like change from en-GB to en-US, replace old words with different modern words, or change text for "American readers", whatever that means. I have no idea where you got that impression.

I personally worked on the Forsyte saga. If you think something was done in error, please let us know and we'll be happy to fix it.

I commented on this kind of editing several years ago:

https://news.ycombinator.com/item?id=16957359

The edit is still in place, and I still maintain that changing 'phone to phone in dialogue changes the meaning.

Yeah, that edit clearly changes the meaning of the text.
> The only thing we do is very light sound-alike spelling modernization, like "to-night" -> "tonight".

Curious. Why even bother?

Guess: screen readers and such.
One could argue that this falls into the previous poster's thought about "the little differences to modern English are part of the charm" ...
You may already be aware, but SE marks all commits making those kinds of changes as '[Editorial]', so it is generally trivial to use their tooling to build your own high-quality ebook without any of the editorial changes.
When I tried this in the past, it was non-trivial because the editorial changes are mixed with the technical changes. Reverting the editorial changes broke the technical changes.
SE sounds truly, truly awful. Thanks for making me aware of its existence so I can avoid it.
They're providing beautifully made ebooks for free...

The only thing they are is truly, truly wonderful.

But why not be true to the original author's text? What's the need to modify it?
Not parent, but while I can appreciate your viewpoint, I would like to point out that many many many books have abridged, reworded, simplified, or disambiguated versions for different audiences.

The Bible is I daresay the most famous of these. Translations aside, even the English versions have had significant alterations done to wording, spelling, and meaning depending on the version.

There's also the Great Illustrated Classics imprint for certain classic novels like H.G. Wells's The Invisible Man. (I read that one like 10 times as a kid and it's what got me into sci-fi as a whole I'd argue. Haha.)

Whether these alternate versions are good or bad is obviously up for debate and depends on the person, but I'm just saying that what SE does is hardly new in the publishing world.

SE is an amazing and wonderful resource
I believe our new-ish CEO Eric Hellman actually did some work on something very similar
That's an interesting idea. not a small feat to accomplish though ...