| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by AlphaGeekZulu 949 days ago
	Calibre is outstanding - really one of my most important research tools. Fulltext search over the content of an entire library is a killer feature. I maintain a couple of hundreds books in PDF format and sync them automatically with a Samsung tablet. Version 7.0 crashes on my Ubuntu 23.10, unfortunately. Had to return to 6.29.0.

5 comments

Fluorescence 949 days ago

If you run it from the terminal you can see the error. For me on 22.04 it was missing a dependency:

    sudo apt install libxcb-cursor0

link

AlphaGeekZulu 949 days ago

Yess, thanks! Stupid me.

Installer is also hinting for the missing dependency now before installation.

link

hiAndrewQuinn 949 days ago

Wait, full text including into PDFs? I didn't know that, that's really big!

link

AlphaGeekZulu 949 days ago

Yep! In the search field above the library window, press the "FT" button on the very left. You have to create the index the first time you use the feature. It takes a while and from there on new books are indexed automatically.

FT search has word and phrase search, boolean operators and NEAR search abilities. And there is a really cool match list, giving some context of the match before you actually go to it in the PDF file.

You cannot search across libraries, though.

link

AlphaGeekZulu 949 days ago

Note: AS far as I know, Calibre does not do OCR, so a PDF with only scanned content will not work.

link

ggpsv 949 days ago

I've had good luck using Tesseract [0] for scanned PDFs. If you're not CLI-inclined, there are several GUIs for it available [1]. I have had good luck downloading scanned PDFs from archive.org and running them through Tesseract.

Did not know about Calibre for this - I was relying on opening each search and searching it individually.

[0]: https://github.com/tesseract-ocr/tesseract [1]: https://www.opait.com/tessstudio/

link

kristofferR 949 days ago

OCRmyPDF is a tool using Tesseract, specifically designed for PDFs. I would recommend that over pure Tesseract.

https://github.com/ocrmypdf/OCRmyPDF

link

kristofferR 949 days ago

I recommend running any such PDFs through OCRmyPDF.

https://github.com/ocrmypdf/OCRmyPDF

link

Fluorescence 949 days ago

Oh, lol, I never knew that.

I created a script that dumped my library of 1000s of books to .txt and then grepped them.

link

silentguy 948 days ago

If you want even faster search across different formats, you can try ripgrep-all ( https://github.com/phiresky/ripgrep-all ). It can search across epub, docx, pdf, zip, mp4 etc. If you are handy with the tool, you can write custom adaptor to search across images using OCR with tesseract.

link

breakds 949 days ago

Curious on "as research tools" - do you use it to manage research papers?

link

account-5 949 days ago

I utilise zotero for this. Was unaware that calibre could do this; the search bit.

link

Pepperating 949 days ago

Could you tell me what software you are using for syncing?

link

AlphaGeekZulu 949 days ago

I am using "Calibre Sync" on Android (Payware: US$ 5,49 and worth every penny).

For viewing/annotating on Android I use Flexcil.

link

Novosell 949 days ago

Why not just use something like syncthing? Does it offer more advanced features than just syncing files?

link

AlphaGeekZulu 949 days ago

> Does it offer more advanced features than just syncing files?

Yes! Covers, Metadata (searchable and including custom columns and comments), reading progress, filters (important for me: tags), different layouts, virtual views.

And it can use the full text search of Calibre (but only when connected to the Calibre content server).

link