Hacker News new | ask | show | jobs
by dpapathanasiou 5923 days ago
That's a non-trivial task.

Yes, that's true.

I only bring it up b/c if your goal is to turn pdfcrowd into an app that people would pay money for (and I would be one of them), solving that problem would go a long way towards achieving it.

3 comments

Solving it perfect is non-trivial (I've known entire PhDs to be spent working on a small subset of the problem). There are a number of products/projects that solve it to some extent (techniques include absolute positioning & making sweeping assumptions about what constitutes a paragraph) - would this be enough for you to consider paying for, given that their assumptions/workarounds might produce HTML files that aren't quite to your 'taste'?
There already many apps and pieces of software that charge for the feature he already has so I don't see why it is a requirement for him to monetize. It definitely would be an easy feature to charge for but I think what he has already has potential.
Total noob question, couldn't you programmatically capture a browsershot and then convert that into a PDF?

HTML -> png seems to have been figured out. Is .png -> pdf that hard to do?

No, .png to .pdf is not difficult.

I believe dpapathanasiou's suggestion is not to blindly convert a pdf into html file with one giant image file of the pdf.

Instead, he wants to create an html document that maintains the same content and layout from the pdf.

D'Oh! Got myself mixed up there a bit.