Hacker News new | ask | show | jobs
by 23skidoo 1332 days ago
I'm a little perplexed by the ISBN system. The whole centralized affair, where you have to purchase ISBNs seems like a racket. ISBNs cost more in some countries (America) than they do in others (Canada). Not for any reason other than that they can get away with it.

Much better would be a UUID generated from unique values, like a hash of the timestamp and publisher of a book. If you limit the length and number of the fields you hash to generate the UUID, you could even prove there will be zero collisions and eliminate any need to collision checks and thus an organization that charges money.

4 comments

ISBN was introduced in 1970. While hash functions did exist at this point (https://en.wikipedia.org/wiki/Hash_function#History) the computational resources generally available for this sort of thing were... rather lacking. The Apple II wasn't introduced until 1977.

I will leave figuring out which hashing functions were known back in 1970, and experimenting with calculating them by hand, up to you. :)

While archaic, ISBN doesn't seem a bad system to me.

Short values are more reliable in retail situations. They can be typed in by hand or read with cheap scanners.

You are of course free to publish without an ISBN if you don't care about the legacy ecosystem.

There's nothing stopping anyone from creating or promoting an alternative but I don't think the incentives are there. There's not enough money in it, and I don't think the cost savings are enough to make a switch compelling.

That's definitely an interesting question, why they don't use a longer identifier without central/hierarchical allocation. I don't have an answer, but some possibly relevant points:

* Rather than compute a hash you could just generate a random number: same risk of collision if done correctly (but different opportunities for making a mistake).

* When ISBNs were introduced in the 1960s people would have been typing and even handwriting them so keeping them short would have been important.

* ISBNs have now been incorporated into EANs (13 digits), which are used for all things sold by retailers, except in the USA and Canada, which, according to Wikipedia, use a system called UPC. (Ironically, the U stands for "universal" while the E stands for "European". Of course the 12-digit system got incorporated into the 13-digit system. Probably there will be a 14-digit system one day.)

* In a UK supermarket if the barcode won't scan someone has to type in the digits. I assume that in most cases they type all 13 digits but I haven't watched carefully. (Of course I am now inspired to watch more carefully next time it happens.) They could have a really clever interface connected to a real-time database of barcodes which recently failed to scan because I expect whole batches of a product have badly printed or crinkled packaging.

* A suitably designed 25-digit system would only take twice as long, or less than twice as long, to type in as the current 13-digit system, but the system would have to be suitably designed for that purpose. Having the computer tell the human at the end "there's a mistake somewhere" would be no good at all. At the very least you could have a check digit for each half and tell the human which half contains the mistake but of course you could do much better than that ...

* I have noticed that Sainsbury's (a major UK supermarket) has a system of 8-digit barcodes for its own products, but Tesco (another major supermarket) uses the standard 13-digit barcodes for its own products.

* ALDI products have giant barcodes printed in several places on the packaging without the corresponding digits printed underneath the barcode: the scanner will never fail!

> Much better would be a UUID generated from unique values, like a hash of the timestamp and publisher of a book. If you limit the length and number of the fields you hash to generate the UUID, you could even prove there will be zero collisions and eliminate any need to collision checks and thus an organization that charges money.

That's false. Your algorithm of hashing a timestamp and book publisher name cannot be proven to be collision-free.

but the probability of 16 completely random bytes is extremely low..
Yes, but I was refuting a false point, that those bytes can be proven to never collide... Obviously, they can collide. In the real world, programmers should be prepared for random collisions, yes, but also for created collisions...

False assumptions are the bane of correct design and will cause an entire system to fail in unpredictable ways or be exploited without detection.