Hacker News new | ask | show | jobs
by tptacek 2362 days ago
It's linked right from the spec: it's a streaming AEAD construction, and getting that is literally one of the motivations behind the tool. It does not buffer the whole message in memory, or fail to detect truncation.
3 comments

I think it's reasonable to ask for clarity on this point. One might, reasonably imo, expect buffering. (And E2BIG if a size limit is exceeded.) You have to dive deeper into the spec than a genre savvy user should need to learn this.
While the underlying algorithm detects truncation/corruption, the specification does not describe how the command line tool signals this to the caller.

This is problematic since a caller needs to be aware of the need to appropriately handle truncated plaintext output. The readme needs to warn about this pitfall.

I think that you misunderstood nullc's question. They are asking what happens if at some point one of the poly1305 MACs in the file is incorrect. Not if someone truncates the file.
We're talking about the same thing.

I saw that it used a streaming AEAD, but that's actually what inspired my question.

Since (from the github page) it reads stdin, it can't two-pass the file.

So it appears that if you hand it a file with midstream corruption it's going to feed a truncated input down your pipeline.

That has consequences. They may well be less serious consequences than buffering a potentially unlimited amount of data in memory :), but it's useful to make the behavior very clear because it wouldn't require too advanced an idiot to make something that was exploitable on this basis.

It's Rogaway's STREAM scheme from https://eprint.iacr.org/2015/189.pdf. Are you pointing out a problem in the paper, or in some specific idiosyncrasy you see of how it's implemented here? If so: what is it?

The AGL post the spec links to directly talks more generally about the high-level strategy: you're buffering chunks of files. You're only ever releasing authenticated plaintext. If you're piping to something processing plaintext on-line, that thing might need to wait for the end-of-file signal before processing or else potentially operate on a truncated file (by some integral number of chunks). `age` is still just a Unix program.

My question was asking to confirm that it indeed will put out a truncated output when given a mid-stream corrupted input (and that it doesn't do something like buffer just to validate).

That behavior should be clearly documented, so that users can be advised that their pipelines need to safely handle that case.

> that thing might need to wait for the end-of-file signal before processing or else potentially operate on a truncated file

Exactly. The docs should say this clearly, or someone will manage to create an interesting vulnerability with it eventually. :)

Could go with a message the points out that encryption doesn't authenticate the source-- which is a not uncommon misuse that shows up with PGP, where people assume that the source is authentic if the input was encrypted, even where no signature is used. (the fact that corrupted input gives an "authentication failed" message might be particularly misleading)

It's streaming on-line encryption. That's literally the point of streaming encryption: not buffering whole messages. The rest of your point directly follows from "not buffering whole messages".
Indeed. And the readme and the usage output makes no mention of streaming, buffering, on-line, authentication, or anything related.

This is a potential security relevant behavior that most users-- who haven't written or analyzed tools like this-- would find surprising.

For those following along, I went and tested it-- since the behavior wasn't documented or clear from the code. If it encounters midstream corruption it truncates the output, exits with a non-zero return and prints some error text std stderr: "Error: chacha20poly1305: message authentication failed\n[ Did age not do what you expected? Could an error be more useful? Tell us: https://filippo.io/age/report ]"

If the input is truncated, it either does that-- or if the truncation is on a block boundary it prints "Error: unexpected EOF\n[ Did age not do what you expected? Could an error be more useful? Tell us: https://filippo.io/age/report ]" instead.

It's not a problem, but it should be documented.

The fact that this is the point of streaming encryption does not preclude the usefulness of pointing it out explicitly. It eliminates a reasoning step by spelling it out, which is always a good thing for critical things, IMO.
Serious question: If you're not signing (age does not*), then what is the point of the AEAD STREAM scheme? By definition, nothing is authenticated, right?
Consider this attack.

You found a vulnerability in FooSmith and want to collect a bounty. You're keeping the vuln secret both for security reasons, and so no one else can jump your claim.

FooSmith has announced a bounty process where you can claim a bounty by sending an encrypted message with a novel vulnerability according to a specified process.

So you send a report using the mandatory bounty collection form, which starts off with a fixed position field "Bitcoin address to pay bounty to: <address goes here>".

I happen to know what address you're going to use since you posted it so everyone could see when you got paid. I happen to have write access to FooSmith's issue tracker. I xor youraddress xor myaddress into the stream at the right position, and tada thanks to the fragility of stream ciphers, esp unauthenticated ones: it decrypts to a different message that asks for the payout to my address.

Adding a digital signature to the encrypted wouldn't have magically made it secure: I would just rip that one off and replace it with my own-- FooSmith can't authenticate a signature here, the authentication is "common membership inside an encrypted message", and without authentication that can't work securely.

There are other attacks when the encryption lacks a auth. Imagine you run a network service that accepts encrypted messages and decrypts them then reports back various distinct result messages based on what the input decrypted to.

I have an encrypted message for your service authored by someone else and I'd like to learn about its content. Without auth I could start sending it to you over and over again, flipping bits in it to learn about the content. In some cases, when the planets align just right, this kind of bug lets you use the service as a decryption oracle-- you can get the entire encrypted message!

(Toy example: if the service reports the input in an error message, simply corrupting the first bit might instantly get you the content. But it can be much more complex and subtle than that.)

This isn't to say that you couldn't build a security protocol that didn't use authed encryption... you can, but without auth the encryption doesn't form a nice abstracted layer and much more of the application has to be analyzed from the perspective of cryptographic attacks. History has shown people fail to do this well, so authed encryption should almost always be used unless there is a really good reason why it can't be.

'nullc and I are talking about the same thing.