Hacker News new | ask | show | jobs
by IvanK_net 1244 days ago
I wrote a free PDF editor (open a PDF, edit, export a PDF), my users edit around 500,000 PDF files every month.

I have been gradually improving it for the past five years. It is a part of my photo editor https://www.Photopea.com. I know really a lot about PDF, I wish I didn't know that much :D I am glad to see that there are others who try to "make sense" of PDF files instead of just rendering them :)

** fun fact: Often, a PDF contains text as an array of characters, each has its X and Y coordinate and a style (white characters omitted). It is up to you to "cluster" them into words, lines, paragraphs ...

** Often, PDF text is made uneditable (on purpose). You see a text "Hello", but in fact, there is a text "bsiin", and a font, which renders "b" with a shape that looks like a letter "H", "s" as "e", and so on. If you open that PDF in a PDF viewer, select "Hello" and copy-paste it elsewhere, you get "bsiin".

3 comments

Photopea is fantastic. I don't use Photoshop enough to justify a cloud subscription and adobe has shut down the licensing service for the version I have on disc (CS3).

https://community.adobe.com/t5/photoshop-ecosystem-discussio...

Photopea is a great solution, and I'm both glad it exists and that you are able to solve your issues using it.

But the fact that we as a society have accepted

> adobe has shut down the licensing service for the version I have on disc (CS3).

as something normal and acceptable is insane to me.

I haven't accepted it. I sail harder all the time. The Adobe Creative Suite hasn't been a recent priority, but I should look it up on principal. Thank you.

Photopea also helps when you're at a random computer and can help someone do an edit that would otherwise require access to a computer with software installed.

I also had some exposure to PDF and looking back it's almost better you'd render it then OCR on the rendered page.
> render

I think you meant raster

How do you deal with scanned pdf?
It is usually a PDF containing a single JPG file inside, which you can see and export at the original resolution.

To edit it, I guess you could paint over the text with white, and add a new text on top of it.