| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by thrdbndndn 1353 days ago

Probably totally digress, but I wish IA can organize their digital library slightly better.

One day I was checking some manga books by ISBN on IA just out of curiosity. And for some reason, it put the ISBNs for all the volumes of a manga into one single entry (https://archive.org/details/isbn_1919979003907, check "ISBN" metadata section) and unsurprisingly, the actual content is only one volume, vol.43 (not even vol.1!). I have a feeling other volumes may exist somewhere there, but there is no way to search for them.

This isn't a one-off occurrence either, it reflects my experience for trying to find specific item there well, especially for non-English books.

4 comments

textfiles 1353 days ago

On a given day I'm moving tens of thousands of items around to make them easier to find. I'm sure I'll get to your section sooner or later.

link

jrajav 1353 days ago

Are you involved with IA? I'm actually really interested in what your day to day looks like, could you share more?

link

coderintherye 1353 days ago

Jason's day-to-day is pretty well covered in his Twitter account: https://twitter.com/textfiles

link

Wingy 1353 days ago

textfiles is Jason Scott[0]

[0] https://en.wikipedia.org/wiki/Jason_Scott

link

mdp2021 1353 days ago

And since we are there, K. Savetz (submitter) is "manager of special collections at Internet Archive".

link

throwaway742 1353 days ago

Thank you for your service.

link

textfiles 1352 days ago

Every day is a joy.

link

giantrobot 1353 days ago

A lot of the time the metadata accuracy is up to the original uploader. IA's upload system doesn't magically fill in all the metadata details for an item.

link

adhesive_wombat 1352 days ago

Also doesn't allow other to update metadata or even submit for review.

Wikidata has a property for Internet Archive ID, so it wouldn't be conceptually hard to construct a parallel metadata store there, but it would involve hundreds of millions of triples so it's definitely "hard" in other senses.

link

mdp2021 1353 days ago

While I also wish the Archive to be more precise - e.g. in the "Author" and in the "Year of publication" fields -,

I suggest that you check their RSS feeds to see how staggeringly high the rate of uploads is. That uploading is "frenetic" (in a good way of course) reveals where the focus is. For re-assessing and fixing the records a parallel team would probably be needed.

I would gladly help towards that: I never checked but maybe one can volunteer.

link

JKCalhoun 1352 days ago

I agree. I had wondered how successful and easy it would be to create a "front end" site that does a better job of searching, organizing archive.org.

link