Hacker News new | ask | show | jobs
by marianminds 4313 days ago
One thing that really sucks still is the conversion of PDFs (for e.g. journal articles) into formats suitable for e-ink readers. I've tinkered with its heuristic processing and regex formatting, but I'd never considered manually touching up the final .epub as it comes. If their ebook editor is any good I might start reading journal articles again.
5 comments

I think it would make much more sense to instead have the journals publish articles in more formats, rather than just HTML and PDF. For example, they should offer them in EPUB[0].

Since EPUB is much more accessible to blind/visually impaired people than PDF, perhaps the federal government could step in and mandate that all articles with content produced using federal grants must be available in a format that the blind/visually impaired can consume as well.

0: http://scholarlykitchen.sspnet.org/2013/03/19/is-it-time-for...

This converter is reasonable: http://www.willus.com/k2pdfopt/ I wonder if someone has done a Calibre plugin for it.
> "K2pdfopt works by converting each page of the PDF/DJVU file to a bitmap and then scanning the bitmap for viewable areas (rectangular regions) and cutting and cropping these regions and assembling them into multiple smaller pages without excess margins so that the viewing region is maximized. Making use of this method, k2pdfopt can re-flow text lines, even on scanned documents"

Looks promising. Hopefully this would also remove javascript and executable code from the source PDF, although any exploits may run within the context of the converter. To be safe, conversion could be run from a livecd.

More information on analysis of PDF malware: http://blog.didierstevens.com/programs/pdf-tools/

PDF malware can be used for economic espionage targeting commercial research. What would help is a single open registry which has: bibliographic metadata + hash of known-good PDF for each paper.

Hey, that's pretty neat, I was just thinking it shouldn't be that hard to do something like that. I would love to be able to read academic papers on my Kindle Paperwhite, this might help with that. Reading on a regular tablet is a bit annoying at times.
I've used k2pdfopt for reading two-column formatted academic papers on Kindle Paperwhite, it works great.
If you use Mendeley for organising your papers check out KinSync.com. Pretty good for this.
Thanks for the information. This looks pretty good, will give it a whirl - glad to find it already on the ArchLinux AUR.

Edit: I gave it a test run, and found it does the job very well. Thank you again!

thanks for sharing. I didn't know that one. Before I've been using briss

http://sourceforge.net/projects/briss/

I found that the easiest/best way to read PDFs on an e-reader is by extracting the text with PDFminer. It throws away the images, and the formatting often sucks, but at least you can read the text pretty well. I didn't try 2-column journal articles, so maybe it doesn't work for those, though.

I tried all sorts of other things, but this was the least painful.

Epub editor is an HTML editor with live preview. Gets the job done.
The best way I know is to open the PDF in Adobe Acrobat and export to HTML. Then convert the HTML to epub or mobi in e-calibre.