| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by AlphaGeekZulu 946 days ago

Yep! In the search field above the library window, press the "FT" button on the very left. You have to create the index the first time you use the feature. It takes a while and from there on new books are indexed automatically.

FT search has word and phrase search, boolean operators and NEAR search abilities. And there is a really cool match list, giving some context of the match before you actually go to it in the PDF file.

You cannot search across libraries, though.

2 comments

AlphaGeekZulu 946 days ago

Note: AS far as I know, Calibre does not do OCR, so a PDF with only scanned content will not work.

link

ggpsv 946 days ago

I've had good luck using Tesseract [0] for scanned PDFs. If you're not CLI-inclined, there are several GUIs for it available [1]. I have had good luck downloading scanned PDFs from archive.org and running them through Tesseract.

Did not know about Calibre for this - I was relying on opening each search and searching it individually.

[0]: https://github.com/tesseract-ocr/tesseract [1]: https://www.opait.com/tessstudio/

link

kristofferR 946 days ago

OCRmyPDF is a tool using Tesseract, specifically designed for PDFs. I would recommend that over pure Tesseract.

https://github.com/ocrmypdf/OCRmyPDF

link

kristofferR 946 days ago

I recommend running any such PDFs through OCRmyPDF.

https://github.com/ocrmypdf/OCRmyPDF

link

Fluorescence 946 days ago

Oh, lol, I never knew that.

I created a script that dumped my library of 1000s of books to .txt and then grepped them.

link