Hacker News new | ask | show | jobs
by jcims 1301 days ago
A specific class of parsers

>parsers for untrusted input in C

2 comments

Pretty much all input is untrusted unless it originated (exclusively!) from something with more permissions that is trustworthy.

The kernel is written in C.

So that pretty much means all parsers written in C and every other language should consider all input untrustworthy, no?

Linux is probably the most carefully constructed C codebase in existence and still falls in to C pitfalls semi regularly. Every other project has no hope of safely using C. It's looking more and more like Linux should be carefully rewritten in Rust. It's a monstrous task but I can see it happening over the next decade.
I agree with the spirit of your comment.

> Linux is probably the most carefully constructed C codebase in existence and still falls in to C pitfalls semi regularly.

My guess is that it would actually be OpenBSD, but I'm not sure either way.

That didn't answer anything. If you want to do anything with your input, you have to run it through a parser. Doesn't matter if it's untrusted or not. Your only options are ignoring the input, echoing it somewhere, or parsing it.
They're saying don't write such a parser in C. Use something else (memory safe language, parser generator, whatever).
And then do what with it? Throw it away?

If it hands it to a C program, that C program needs to parse (in some form!) those values!

How is a C program expected to ever do anything if it can’t safely handle input?

Untrusted input -> memory-safe parser -> trusted input -> C program.

Probably not that important for `ls`, probably worth it for OpenSSL.

The challenge of course is the links to the ‘memory safe parser’, or how it gets from the untrusted input to it mediated by C, correct?
Right, but you do have the option of writing that parser in a language other than C. And given how often severe security issues are caused by such parsers written in C, one probably ought to choose a different language, or at least use C functions and string types that store a length rather than relying on null termination.
>>>>> Why are people still using parsers for untrusted input in C?

No matter what the parser itself is written in, if you're writing in C you'll be using the parser in C.

If you have the input in a buffer of known length in C, hand it off to a (dynamic or static) library written in a safe language, and get back trusted parsed output, then there's much less attack surface in your C code.
The issue in many of these cases is there appears to be no canonical safe way to know the length of the input in C, and people apparently screw up keeping track of the lengths of the buffers all the time.
This is why you reduce the amount of C code that has to keep track of it to as little as possible.
1. Well don’t write in C then if your program is security critical or going to be exposed over a network. Sure, there are some targets that require C, but that’s not the case for the vast majority of platforms running OpenSSL.

2. That’s still less of a problem as the C will then be handling trusted data validated by the safe langauge.

If you make argument 2) could you explain how writing a parser is more security critical than any other code that has a (direct or indirect) interaction with the network? At least recursive descent parsers are close to trivial. I usually start by writing a "next_byte" function and then "next_token". You'll have to look very hard to find any pointer code there. It's close to impossible to get this wrong and I don't see how the fact that it's a parser would make it any more dangerous.
Well if you're dealing with a struct then the compiler will provide type safety if say you try to access a field that doesn't exist. You don't get the same safeguards when dealing with raw bytes. Admittedly in C you can also run into these hazards with arrays and strings, which I why I suggest using non-standard array and string types which actually store the length if you insist on using C.
> It's close to impossible to get this wrong and I don't see how the fact that it's a parser would make it any more dangerous.

I can answer that one. The parser is more dangerous because a parser, essentially by definition, takes untrusted input.

Nothing the parser does is any more dangerous than the rest of the code; it's all about the parser's position in the data flow.

I agree that the original statement encourages that interpretation, but I think it admits the interpretation that the parser itself is in C and I think that is what was intended.
Even if application constraints mean you can't write a parser in another language that's linkable to C, why couldn't you use a parser generator that outputs C?