Hacker News new | ask | show | jobs
by opminion 233 days ago
https://en.wikipedia.org/wiki/Uniform_Resource_Identifier
1 comments

This isn't a solution either. Not sure why you think it is. Here's how I name files, just as an example:

    Meditationes de Prima Philosophia - GTNB•0023306 (2007) - Descartes, René (aut)

    Meditations on First Philosophy - 9780203417621 (2013) - Descartes, René (aut); Haldane, Elizabeth (trl); Ross, G. R. T. (trl) & Tweyman, Stanley (edt,wfw)
Where and how should I put a URI in there, especially considering that they at minimum need the colon (:), which is a problematic character in filenames on NTFS/HFS/APFS/XFS? They're not exactly disallowed, but they create a resource fork or some shit and so it doesn't behave as you would expect. If Standard Ebooks just started numbering their books, then I'd slap the STBK• in front of the number and use that. They're not in Worldcat, or I could use OCLC numbers (but it shouldn't be other people's job to keep the catalog of their own books).
choose your favorite hash

    hash(<dc:identifier>)
Hashes are too long, aren't human-recognizable as to meaning, etc. I don't want half-assed workarounds. They need to uniquely number their books.
- they don’t need to do anything to conform to your arbitrary organization choices

- hashes are as long or short as you need them to be

- publication timestamp is in every ebook’s metadata, is almost guaranteed to be unique, monotonically increases, and has actual semantic meaning compared to an isbn or oclc

>they don’t need to do anything to conform to your arbitrary organization choices

They don't need to. It'd be smart. It's not "arbitrary". It's fucking library science.

>hashes are as long or short as you need them to be

Hashes might uniquely identify a computer file, but they don't uniquely identify an edition/release of a published book. Some jackass on libgen decides to tweak a single byte, now it has a new hash... but it's not a new edition.

>publication timestamp is in every ebook’s metadata

As someone who takes a look at every internal opf file, no... they're not in every ebook.

You're suggesting I go to the extra trouble of doing a job they could do easily, when I can only do it poorly, and I don't know why... because the first person to respond was a dumbass and thought I was attacking him? I swear, 99% of humans are still monkeys.

You don't need to hash file contents (though that is often a useful thing to do). You can hash e.g. the URL that was earlier claimed to be the canonical identifier. Running it through your favorite hash function fixes your complaints about file names (choose your favorite hash function such that it is not too long and only outputs allowed characters).