| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jcims 1348 days ago
	A specific class of parsers >parsers for untrusted input in C

2 comments

lazide 1348 days ago

Pretty much all input is untrusted unless it originated (exclusively!) from something with more permissions that is trustworthy.

The kernel is written in C.

So that pretty much means all parsers written in C and every other language should consider all input untrustworthy, no?

link

Gigachad 1348 days ago

Linux is probably the most carefully constructed C codebase in existence and still falls in to C pitfalls semi regularly. Every other project has no hope of safely using C. It's looking more and more like Linux should be carefully rewritten in Rust. It's a monstrous task but I can see it happening over the next decade.

link

Shared404 1347 days ago

I agree with the spirit of your comment.

> Linux is probably the most carefully constructed C codebase in existence and still falls in to C pitfalls semi regularly.

My guess is that it would actually be OpenBSD, but I'm not sure either way.

link

thaumasiotes 1348 days ago

That didn't answer anything. If you want to do anything with your input, you have to run it through a parser. Doesn't matter if it's untrusted or not. Your only options are ignoring the input, echoing it somewhere, or parsing it.

link

Diggsey 1348 days ago

They're saying don't write such a parser in C. Use something else (memory safe language, parser generator, whatever).

link

lazide 1348 days ago

And then do what with it? Throw it away?

If it hands it to a C program, that C program needs to parse (in some form!) those values!

How is a C program expected to ever do anything if it can’t safely handle input?

link

tsimionescu 1347 days ago

Untrusted input -> memory-safe parser -> trusted input -> C program.

Probably not that important for `ls`, probably worth it for OpenSSL.

link

lazide 1347 days ago

The challenge of course is the links to the ‘memory safe parser’, or how it gets from the untrusted input to it mediated by C, correct?

link

nicoburns 1348 days ago

Right, but you do have the option of writing that parser in a language other than C. And given how often severe security issues are caused by such parsers written in C, one probably ought to choose a different language, or at least use C functions and string types that store a length rather than relying on null termination.

link

thaumasiotes 1348 days ago

>>>>> Why are people still using parsers for untrusted input in C?

No matter what the parser itself is written in, if you're writing in C you'll be using the parser in C.

link

wongarsu 1348 days ago

If you have the input in a buffer of known length in C, hand it off to a (dynamic or static) library written in a safe language, and get back trusted parsed output, then there's much less attack surface in your C code.

link

lazide 1348 days ago

The issue in many of these cases is there appears to be no canonical safe way to know the length of the input in C, and people apparently screw up keeping track of the lengths of the buffers all the time.

link

saagarjha 1347 days ago

This is why you reduce the amount of C code that has to keep track of it to as little as possible.

link

nicoburns 1348 days ago

1. Well don’t write in C then if your program is security critical or going to be exposed over a network. Sure, there are some targets that require C, but that’s not the case for the vast majority of platforms running OpenSSL.

2. That’s still less of a problem as the C will then be handling trusted data validated by the safe langauge.

link

jstimpfle 1348 days ago

If you make argument 2) could you explain how writing a parser is more security critical than any other code that has a (direct or indirect) interaction with the network? At least recursive descent parsers are close to trivial. I usually start by writing a "next_byte" function and then "next_token". You'll have to look very hard to find any pointer code there. It's close to impossible to get this wrong and I don't see how the fact that it's a parser would make it any more dangerous.

link

nicoburns 1348 days ago

Well if you're dealing with a struct then the compiler will provide type safety if say you try to access a field that doesn't exist. You don't get the same safeguards when dealing with raw bytes. Admittedly in C you can also run into these hazards with arrays and strings, which I why I suggest using non-standard array and string types which actually store the length if you insist on using C.

link

thaumasiotes 1348 days ago

> It's close to impossible to get this wrong and I don't see how the fact that it's a parser would make it any more dangerous.

I can answer that one. The parser is more dangerous because a parser, essentially by definition, takes untrusted input.

Nothing the parser does is any more dangerous than the rest of the code; it's all about the parser's position in the data flow.

link

dllthomas 1348 days ago

I agree that the original statement encourages that interpretation, but I think it admits the interpretation that the parser itself is in C and I think that is what was intended.

link

harshreality 1348 days ago

Even if application constraints mean you can't write a parser in another language that's linkable to C, why couldn't you use a parser generator that outputs C?

link