Hacker News new | ask | show | jobs
by msla 2063 days ago
DjVu is a great format for scanned images, which is its primary use-case, but I'm not seeing where you can have actual, selectable text in a DjVu document, like you can with PDF and PostScript. It seems like it's all images.
2 comments

> 3.3.2 Hidden text

> Every DjVu image optionally includes a hidden text layer that associated graphical features with the corresponding text. The hidden text layer is usually generated by running Optical Character Recognition software. This textual information provides for indexing DjVu documents and copying/pasting text from DjVu page images.

I copied that text from the DjVu spec, which is in the DjVu format.

I have not read the specification, but the DJVu format must have a way to store the plain text besides the images and that way is frequently used.

I do not remember ever reading a DJVu file that did not allow searching and selecting the text, while PDF files which do not allow those, because they store only the scanned images, are quite frequent.