I made a high-quality scan of PAIP (Paradigms of Artificial Intelligence Programming), and worked on OCR'ing and incorporating that into an admittedly imperfect git repo of Markdown files. I used Scantailor to deskew and do other adjustments before applying Tesseract, via OCRmyPDF. I wrote notes for some of my process over at https://github.com/norvig/paip-lisp/releases/tag/v1.2 .
https://news.ycombinator.com/item?id=42952605 - Ingesting PDFs and why Gemini 2.0 changes everything