Hacker News new | ask | show | jobs
by weinzierl 3066 days ago
I‘m a heavy user of Qubes OS. The “Convert to Tusted PDF” feature is something I use almost daily.

My use case is examining, cleaning and possibly distributing application letters and CVs. If you have to read job application letters, the advice to just open files from people you trust, just doesn’t work. The amount of untargeted malware we receive through this channel is considerable. We had targeted attacks too.

I’ve known about Qubes OS for a long time but interestingly the advice to use it for all processing of application letters didn’t come from my tech circles but from a recruiter.

Given the strict laws about data retention in my jurisdiction (Germany) a cloud solution (short of homomorphic encryption) probably isn’t going to work for me. The idea of using discrete devices sounds interesting though.

1 comments

Have you considered a jailed pdf reader application instead? I'm curious what the decision factors were important for you.
So everyone downstream of weinzierl has to be aware that (1) the PDFs he hands them may be full of malware (2) have to use VMs (3) must open said malware-packed PDFs in a disposable VM (4) must strictly adhere to D-VM usage protocol.
Or weinzierl could print them and possibly rescan for further distribution.
Which is basically what Qubes "Convert to trusted PDF" does.
Convert to DjVu.
My first solution would be improving reader security by starting with one with decent code (Espie suggested MuPDF), compiling it with something that makes it memory-safe, and running it in a sandbox on separation kernel (eg Genode or Muen). Then, a memory-safe conversion tool turns it into something more trustworthy. This might even be batched on simple hardware which itself has lower attack surface. Later on secure hardware like CHERI CPU albeit that can happen today if you have FPGA board and skills to run their HDL code.

For fun, though, I'll dust off an old concept since you're talking printing. One might start by printing them to a virtual screen like in Nitpicker GUI with the untrusted reader. Aside from isolation, there could be a feature to convert what's on the virtual screen or page into a compressed image. A PDF with N pages becomes a zip of N images or a single image of some size. That itself could be distributed to run in the trusted, safe viewers we already should have, right? ;) It might also be run back through similarly-deprivileged OCR to turn into a safer format. Gotta eyeball it if doing it that way. That said, there are fonts that work well with OCR that it might be converted to as part of image production if OCR is the goal in the first place.

Could be a fun, little project teaching folks about a number of topics at once.

Your "first solution" would be to take a de novo PDF implementation written in C, "compile it with something that makes it memory-safe", and then port it to an L4 microkernel. Maybe bust out some HDL and get parts of it deployed directly on to FPGA.

Got it.

I said a separation kernel like the FOSS projects and commercial products dating back to 2005 I told Joanna about on Qubes mailing list which were compartmentalizing things on security-focused kernels. Aside from small TCB, they have optional mitifations for storage and timing channels. Aside from isolation, a standard practice on embedded side was including safe subsets of Java or Ada running right on the kernel to implement specific components more safely. So, basically just what was standard, deployed practice in high security over a decade ago.

Optionally, I also pointed out people interested in developing solutions have options available now for safety or security on CPU side, too. They can do software, hardware, mix of both, whatever suits their purposes.

> For fun, though, I'll dust off an old concept since you're talking printing. One might start by printing them to a virtual screen like in Nitpicker GUI with the untrusted reader. Aside from isolation, there could be a feature to convert what's on the virtual screen or page into a compressed image. A PDF with N pages becomes a zip of N images or a single image of some size. That itself could be distributed to run in the trusted, safe viewers we already should have, right?

Which is literally what Qubes "Convert to trusted PDF" does.

> My first solution would be improving reader security by starting with one with decent code (Espie suggested MuPDF), compiling it with something that makes it memory-safe, and running it in a sandbox on separation kernel (eg Genode or Muen). Then, a memory-safe conversion tool turns it into something more trustworthy.

It would of course be preferable to have a secure PDF reader to begin with, but the complexities of the PDF format doesn't isn't really conducive to that.

Oh, that's neat it's what they're doing. Far as secure PDF reader, you can definitely reduce risks it poses with mitigations which reduce headaches when they don't reduce attacks. Those I was thinking of are doing it with acceptable overheads these days. On the far end, the CPU solution already compiles legacy C to run capability-secure on FreeBSD with OS and CPU available to download and run. Just gotta buy the board which has other uses.

So, there's more possibilities to explore on top of these existing solutions.

> It would of course be preferable to have a secure PDF reader to begin with, but the complexities of the PDF format doesn't isn't really conducive to that.

pdf.js exists!

compiling it with something that makes it memory-safe

Isn't it true that there is a bit more work to do to a program to make it memory safe than just recompiling?

Like if the original is in C, recompiling it in C++ won't whisk away unsafe memory access without significant architectural rework, no?

It's nonsense. If it were possible to make C++ memory-safe with a special compiler, it would have been done long ago.
Or, you know, you could use pdf.js, which has two advantages: (1) it already exists; (2) it can exist, unlike your proposal, which involves using a nonexistent memory-safe C++ compiler.