|
|
|
|
|
by ndriscoll
237 days ago
|
|
You don't need to hash file contents (though that is often a useful thing to do). You can hash e.g. the URL that was earlier claimed to be the canonical identifier. Running it through your favorite hash function fixes your complaints about file names (choose your favorite hash function such that it is not too long and only outputs allowed characters). |
|
>choose your favorite hash function such that it is not too long
ISBN's 13 digits is about as long as is tolerable. Any time there is a list of authors six names long (academic titles) along with a subtitle, it's very easy to bump up against max filename size.
This isn't a problem I can solve on my own. Just trying to bring attention to it. My solution thus far is to just avoid publishers who are so unprofessional as to not provide numbers. It's not tough, Project Gutenberg does it. Anyone can do it. If you're some amateur whose entire catalog is 8 books published, you say "this book is 1, and this book is 2" etc, and it's a done deal. Again, I don't expect anyone to use ISBNs (in the US, you have to pay for them unless you're one of the big 5 publishing houses), but just use your own for god's sake.