Hacker News new | ask | show | jobs
by t-3 1205 days ago
> the content is static

Why are you saying this like it's a bad thing? Why would you want dynamic content in a document format?

You can't give someone an epub and tell them to read page 197. You can't look at a webpage and know how it will look when you print it out. You also can't just netcat an ebook to your printer and get useful output.

What alternatives do we really have? PDF is the most widely supported format outside of plain text (arguably more widely supported than text actually, with phones being the major computing device). I haven't seen an ebook format that can correctly and consistently deal with code sections, inline images, quotes, font styling, paging, etc (maybe more a software issue than format problem, but if nobody can make it work right, the format isn't blameless).

3 comments

PDF should be replaced... By a document format that displays statically rendered content in page size chunks, but with some of the backend problems removed.

Stuff like, embedding text in a way that improves accessibility and copying text works nicely. Being able to right click copy images would be also nice.

Improving accessibility may be fair (I'm not familiar with the details), but in many cases PDFs are used for scenarios where the authors don't want the individual assets to be trivially extractable. Even if an advanced user can do so programmatically, a static layout is often a feature, not a bug. If the authors wanted the reader to be able to quickly pull out assets and recreate it they would distribute the content in some other format.
That's dumb, and pointless. Any image or text that is displayed on the screen will always be easily copied. There's no security in making copying an image take five clicks instead of two, it's just annoying. Not annoying because the five clicks are difficult or very time consuming, annoying because "why should I have to click 3 extra times just because someone foolishly thinks their content is being protected when it isn't". The analog hole is always there as a fallback too.

If someone wants something to be private, they shouldn't publish it. Once you put something out into the world, you should expect that people will do whatever they want with it.

Read-Only Docx solves pretty much all these issues.
Not really. Even Word for Windows and Word for Mac don't always render them identically, and in Office 365 it may be something else! And when you add LibreOffice Writer and the like into the equation, all bets are off.
> You also can't just netcat an ebook to your printer and get useful output.

Can you 'just netcat' a PDF to a printer? Which port number? How could I control features such as printing to both sides of the page? I imagined there would be some wrapping protocol (or conversion to postscript?) Does the printer that receives just a PDF binary, and no other control signal, just decide to use a default mode (like always printing single sided only)?

Yes, it works on most printers. My Brother laser printer has a menu option to print out it's network settings, which gives ~20 different protocols with ports which can be used. I don't think you can control the print settings with netcat, but I'm probably wrong. Mostly I use lpr, but the netcat trick is very useful sometimes.
Fascinating, I printed a network report from an HP printer and it listed several services, one of which was:

    9100 Printing = Enabled
So I guessed that I could netcat to port 9100

    cat something.pdf | nc 192.168.1.2 9100
It worked! The printer sat waiting for more content, so I had to press Ctrl-C to kill netcat, but then it immediately printed what it had sent. The margins were off (not like if I had printed through a print driver).
You can if it supports PDF, which lots of modern networked printers do.
New and improved link rot: now updated to include document contents.