Hacker News new | ask | show | jobs
by jacquesm 1069 days ago
We're very much in agreement.

The whole 'null pointer style strings' makes no sense, I think they want to say 'nul terminated'. But fine.

Your examples are excellent, let me add a few more:

Big endian? Little endian? Do we count characters or bytes? Who owns the bloody thing? Can they be modified in place? Are they in ROM or RAM? Automatic? Static? Can they be transmitted over a network 'as is' or do they need to be sent via some serialization mechanism? What about storing them on disk? And can they then be retrieved on different architectures?

The problem really is that C more or less requires you to really know what you're doing with your data and that's impossible in a networked world because your toy library ends up integrated into something else and then that something else gets connected to the internet and suddenly all those negative test cases that you never thought of are potential security issues. So any simplistic view of string handling will end up with a broken implementation regardless of how well it worked in its initial target environment.

C's solution is simple: take the simplest possible representation and use that, pass responsibility back to the programmer for dealing with all of the edge cases. The problem is that nobody does and even those that try tend to get it subtly wrong several times across a codebase of any magnitude.

It's a nasty little problem and it will result in security issues for decades to come. There are plenty of managed languages, I had some hope (as a seasoned C programmer) that instead of this Cambrian explosion of programming languages that we'd have some kind of convergence so that it becomes easier, not harder to pick a winner and establish some best practices. But it seems as though cooperation is rare, much more common is the mode where a defect in one language or eco system results in a completely new language that solves that one problem in some way (sometimes quite convoluted) at the expense of introducing a whole raft of new problems. Besides the fractioning of mindshare.

1 comments

It's not a hypothesis, the thing was already implemented many times in C, C++ and other languages and used for ages especially for networked code, because C "there's no length" approach is a guaranteed vulnerability.
It's not a guaranteed vulnerability, it's a potential vulnerability.

Guaranteed doesn't mean "this will probably happen", it means "this will definitely happen".

The "no length approach" can probably result in a vulnerability. It won't definitely result in a vulnerability.

I mean, come one, if it was a guaranteed vulnerability, almost nothing on the internet would work because they all have, somewhere down the line, a dependency on a nul-terminated string.

I mean, do you think that nginx (https://github.com/nginx/nginx/blob/master/src/core/ngx_stri...) is getting exploited millions of times per hour because they have a few uses for nul-terminated strings?

nginx whacks one mole at a time https://cve.circl.lu/cve/CVE-2013-2028
That CVE has absolutely nothing to do with length up front vs nul terminated strings. It's also two years old. The only thing it does is reference nginx but that's disingenuous, unless the point you're trying to make is that nginx has the occasional security issue, which I think we're all very much aware of. But it doesn't answer the GPs point in any relevant way.
The problem there is in opportunistic bound checking due to loose association of an array with length, string being an example of an array. This vulnerability is a direct consequence of C "there's no length" approach and shows why this approach in unsuitable for networked code.
In C a string is not an example of an array. If we can't agree on terminology for a discussion that requires extreme precision it becomes difficult to keep going.

Networked code does not as a rule use C style nul terminated strings though, in the case of fixed length buffers they will usually be accompanied either by a length field or by zeroing out the end of the string or even the whole buffer (the latter is much better and ensures you don't accidentally leak data from one session to another).

Networked code doesn't have to be written in C to begin with. Regardless of implementation there usually is a protocol spec and you adhere to that spec and if you don't then you'll find out the hard way why it matters.

This particular vulnerability has nothing at all to do with C strings but in fact has everything to do with a broken implementation of length based strings, which could result in the length being negative, which is at least one problem which C style strings do not have... (small comfort there, they have plenty of other problems, but that one they don't.).

This is the fix for that particular CVE:

https://github.com/nginx/nginx/commit/4997de8005630664ab35f2...

Which stems from integer overflow after doing arithmetic on the lengths.

It looks to me as though you just pulled the first nginx CVE that you found and posted it without looking at what the CVE was all about, without realizing that the ancestor comment was referring to the string implementation inside nginx which lives in the referenced file, whereas you are pointing to a CVE related to the parsing of HTTP chunked data requests, which resides in an entirely different file and has nothing to do with string handling to begin with.

Which C compilers are those then?

Also, you keep writing 'null pointer' and 'null', there is a pretty big difference between 'null' and 'nul' and in the context of talking about language implementation details such little things matter a lot. You say a lot of stuff with great authority that simply doesn't match my experience (as a C programmer of many decades) and while I'm all open to being convinced otherwise you will have to show some references and examples.

What doesn't match your experience?
My experience as a programmer of some 40 years in C has yet to expose me to a C compiler that has length based rather than nul terminated strings as the base string type. Please point me to one in somewhat widespread use rather than an experimental implementation that uses this concept and make sure not to confuse libraries with the implementation of the language.
Since no C/C++ compiler supports it, for them implementation is in a library.
So that means they are not part of C/C++. Which was the point. You can write software in C/C++ but that's hardly news and you can use that to create new data types that are not in the language, which also is hardly news.