Hacker News new | ask | show | jobs
by WalterBright 1114 days ago
Your phone camera, hand held, is plenty good enough to digitize each page. Even if they don't lay flat. You could pay a student to just photograph each page. The cost is minimal.

Before anyone says "this will never work! It must be done by $$$$$ professionals! It requires $$$$ equipment!" just pick a book, any book, off your bookshelf, open it up, and take a phone photo.

P.S. It works better with daylight providing enough light through the windows.

2 comments

You make a good point, but there also could be more to it than that:

- Need to make sure the photographers are careful not to damage fragile pages

- Need a system of organization (syncing ten thousand default-named iphone pics with no labels is not ideal)

- You might be ignoring important differences between modern published books on your bookshelf and these materials (ex. maybe font is not same size, maybe font is not modern English, maybe characters are not printed consistently, maybe pages are dirty, all of which could impact OCR-friendliness of an iphone pic compared to something else

- There might even be valuable information in markings below the topmost visible layer which could be revealed by scanning equipment (especially for example if pages are stuck together)

And that's just off the top of my head, without real domain knowledge.

It's not about OCR or dirt. It's about taking an image. I doubt OCR would work on any of them, whether you use a $$$$$ archivist to photograph the pages or not.

As for below the topmost layer, you're right, an iphone camera won't do it. But worrying about that comes much, much later.

Scantailor Advanced will also help process the images into something resembling a readable scan.

But indeed, as long as you have some images you can dump then onto the Internet Archive for immediate posterity (and hope they don't go under when the lawsuit determines a penalty).