Hacker News new | ask | show | jobs
by acabal 233 days ago
I'm shocked and saddened to hear this. Greg was a deep source of knowledge and support as I started and shepherded Standard Ebooks. He was generous with his time and experience, and unbelievably patient with me, some guy he had never heard of or met before who was just another cold-email in what must have been an endless stream in his inbox. We should all aspire to his high spirit of camaraderie, charity, and kindness. The world has lost a champion of both literature and the free web.
1 comments

Why are there no unique numbers assigned to Standard Ebook's ebooks? I understand that there is a cost associated with ISBNs, but it's very irritating to not have something that identifies them uniquely. Most (all?) aren't even in Worldcat, so I can't use OCLC numbers for that purpose either.
> no unique numbers

This suggests a misunderstanding of the Standard Ebooks process, which allows continual incremental corrections to the authoritative source of individual books (in XHTML, on GitHub). So, a truly unique identifier would only be valid to the production output(s) from a particular state of the Git-repo sources.

https://standardebooks.org/contribute/report-errors

Recall also that final user content is made available in multiple formats, currently at least six. Example:

https://standardebooks.org/ebooks/geronimo/geronimos-story-o...

Asynchronous to the correction process, Standard Ebooks updates its own production tools. So if an individual book's content requires correction, should the "respin" be done with TOT tools, or with the versions available at time of first publication? Disclaimer: I don't actually know which is current practice -- but using the TOT tool suite is obviously vastly easier.

For most practical purposes, I'd suggest the git-commit date, along with short substrings of author name and title, would suffice.

>This suggests a misunderstanding of the Standard Ebooks process, which allows continual incremental corrections to the authoritative source of individual books (in XHTML, on GitHub). So, a truly unique identifier would only be valid to the production output(s) from a particular state of the Git-repo sources.

Well, one of us has a misunderstanding. Just because the printer strikes off the printing number from the colophon for each subsequent printing, they don't actually issue a new ISBN. That stays the same. If they wanted to also include a version number too, I wouldn't mind that as well, but it's not nearly as necessary as this. I use the year as a rough version number in the file names as well.

>Recall also that final user content is made available in multiple formats, currently at least six. Example:

I don't need them to issue a number per file format, but if they want to... that doesn't bother me. That's sort of self-evident which of the formats it is, after all.

>I'd suggest the git-commit date, along with short substrings of author name and title, would suffice.

It doesn't. A number of authors have at one time or another have released books with similar or identical titles that are not the same book. This is the trouble... someone who uses or would use the books is asking for something that is missing but easy to supply, and instead of a "well gee, we never considered that, let us think about it" I have a dozen assholes crawling out of the woodwork to say "no, you're doing it wrong".

I need unique identifiers that are human readable. I just do. The world discovered this need for books before you were born. They invented a global standard, even. There is an entire field of science out there about this, that you seem to be ignorant of even existing. I've been doing this for years, and I keep bumping up against it. But you think it can be solved because you used git and know about hashes or whatever, and it's just like what you deal with in your software development job!

> very irritating

I think it’s possible to express this in a less caustic way. Because Standard E-books is high quality and free of charge right?

the ebook identifier uniquely identifies every ebook. standard ebook ebooks use the url as their unique identifier
Those are poor identifiers. A numeric or short alphanumeric identifier that can be part of the filename is important... I have as many as 5 different editions of the same title so title+author doesn't do the trick. Nor am I putting a url into the filename, couldn't if I wanted to as there are disallowed characters in a url in every filesystem I've ever heard of. How difficult is it to keep a incrementing catalog number like Project Gutenberg does? Anything that doesn't have a proper unique just seems unprofessional.
This isn't a solution either. Not sure why you think it is. Here's how I name files, just as an example:

    Meditationes de Prima Philosophia - GTNB•0023306 (2007) - Descartes, René (aut)

    Meditations on First Philosophy - 9780203417621 (2013) - Descartes, René (aut); Haldane, Elizabeth (trl); Ross, G. R. T. (trl) & Tweyman, Stanley (edt,wfw)
Where and how should I put a URI in there, especially considering that they at minimum need the colon (:), which is a problematic character in filenames on NTFS/HFS/APFS/XFS? They're not exactly disallowed, but they create a resource fork or some shit and so it doesn't behave as you would expect. If Standard Ebooks just started numbering their books, then I'd slap the STBK• in front of the number and use that. They're not in Worldcat, or I could use OCLC numbers (but it shouldn't be other people's job to keep the catalog of their own books).
choose your favorite hash

    hash(<dc:identifier>)
Have a little respect, for fucks sake. This does not belong here.