Hacker News new | ask | show | jobs
by lbarrow 4450 days ago
It's super cool to see the power of advanced static analysis these days. Props to the Coverity team for using the Heartbleed trainwreck to motivate new research on these problems.

That said, are there other ways to fix this class of problem? We have choices. We can continue to build ever-more-advanced tools for patching over the problems of C and C++, or we can start using languages that simply do not have those problems.

There will always be a need for C and C++ in device drivers, microcontrollers, etc. But there's no compelling reason why SSL implementations in 2014 should use languages designed to run on mainframes in 1973.

4 comments

> There will always be a need for C and C++ in device drivers, microcontrollers, etc. But there's no compelling reason why SSL implementations in 2014 should use languages designed to run on mainframes in 1973.

Except that safer systems programming languages are older than C with bounds checking by default, having compilers that allowed to disable them if really really required[1]:

Algol (1960)

PL/I (1964)

Modula-2 (1978)

Mesa (1979)

Even VAX, B6500 and 68000 assembly have support for doing bounds checking.

[1] Not the first version of Algol though, as according to Hoare Turing Award speech, customers didn't want unsafe features:

<quote> A consequence of this principle is that every occurrence of every subscript of every subscripted variable was on every occasion checked at run time against both the upper and the lower declared bounds of the array. Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interest of efficiency on production runs. Unanimously, they urged us not to—they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980, language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law. </quote>

Well, usually this stuff is written in C because other languages come with big runtimes that make them unsuitable for utility libraries that need to be callable by everyone and from everywhere.

That said, we're of course working on changing that with Rust. But I should note that memory safety without garbage collection is just hard: it requires the entire language design to be balanced on a delicate precipice. It's not surprising that it's taken a long time to get there.

> But I should note that memory safety without garbage collection is just hard: it requires the entire language design to be balanced on a delicate precipice.

I was thinking about this recently and I think a large part of the problem is that C arrays are too weakly typed. Array should be a different type than pointer and they shouldn't be convertible. In particular, you shouldn't be able to subscript a pointer, and the in-memory representation of an array should begin with its length. At that point the compiler can include a runtime bounds check for every array access that it can't prove is safe at compile time.

Sadly, bounds checking is the easy part; making sure that pointers can't become dangling is the hard one. Unfortunately, use-after-free is an extremely common security vulnerability in the wild (though it wasn't Heartbleed). We solved use-after-free in Rust, at least technically, although there were quite a few tradeoffs we had to make and the usability of the borrow checker is something we're going to need to work on post 1.0.
The more I read about Rust, the more excited I am about it. When I wrote the great-grandparent comment I had Rust in mind.
It isn't hard to do this manually in C, it is just that standard libraries still don't. Partly that is because there is often oneish way to do things that is general and often unsafe and a variety of ways to do things that are safe but more specific to a particular usage pattern. IMO this is also why floating point is popular; it is rarely the best solution to any particular situation but one solution that works ok is often considered better than 10 solutions that work better in partiular cases. Not that this is a good excuse in the case of bounds checking...
>runtime bounds check

The reason people use c, is to get higher speed by avoiding those kinds of checks. A real solution most avoid slowdowns.

If the reason was to avoid those checks, then people really don't know better.

Turbo Pascal => {$R-}

Ada => pragma Suppress (Index_Check);

Modula-3 => cm3 -A

D => dmd -fnoboundscheck / array.ptr[index] in @system code

Just a few examples. Additionally there are lots of research how to remove bounds checking in compiler optimizations

http://citeseer.uark.edu:8080/citeseerx/showciting;jsessioni...

What C has going for it is the symbiotic relationship with UNIX, 40 years of optimizations in most compilers and the historical baggage of being around for so long.

Other better languages suffered from having been shown the door as UNIX spread into the enterprise, but there was a time when C compilers didn't generate better code as the other ones.

You can encode the length of an array in the type in C++, at compile time.
Trevor Perrin (of TACK fame) wrote TLS Lite in Python.

I submitted a link to TLS Lite a few days ago, but, alas, showed poor judgement in timing:

https://news.ycombinator.com/item?id=7564740

Direct link: http://trevp.net/tlslite/

I'm actually rather anxious to hear the knowledgeable crowd discuss this fine project.

It's fantastic if you want to build TLS testing tools, or if you want a codebase to reason about TLS with.
A stamp of approval if ever there was one. Thank you.

What, however, hinders adoption as a "working man's" TLS library? Neglecting performance and variety of cipher support, would or should anything prevent me from using Tiny TLS to secure channels between "inner circle machines" (that talk to a set of well-known participants)?

My advice is not to use obscure TLS libraries in production. Look at the recent Frankencerts paper to see what goes wrong: only OpenSSL, NSS, and Bouncycastle (the mainstream libraries) properly rejected pathological X.509 certificates.

If you're trying to deploy pure-Python applications, I like tlslite. Of course, I have to say that, because Trevor is much smarter than me.

Personally, I think your realistic production choices are OpenSSL or NSS.

Vacillating between floating away on my recently-inflated feeling of self-worth and trying to keep you engaged on a very uneven playing field (as in my not knowing Adam from Eve, so to speak), I'll simply opt for another Thank you.
Apropos nothing else: tlslite and Adam Langley's golang/crypto/tls are the two best codebases on the Internet to (a) learn TLS from and (b) build tools with. They are both extremely great projects.
The TLS library that our company uses for its applications is http://www.yassl.com/yaSSL/Home.html. Not sure if it's considered obscure or not.
Now you just need agl and pbsd to come along and tlslite will hit the HN crypto-trifecta.
You can fix almost any issue in C and use almost any programming paradigm. Some people consider this a bug but I consider it a feature. There is no reason a SSL implementation written in C should be vulnerable to buffer overflow. Since c99's anonymous data allocation it is even possible to have a string plus length struct and a single macro to convert C strings to that structure that can be used in function arguments. It is true that C at best doesn't help correct behavior and sometimes actively encourages incorrect behavior and I think this is a serious issue in the language. However, other languages take a "let me do things right for you" approach that works well as long as your definition of right sufficiently matches the language. But as paradigms come and go it is easy to hit corner cases in such languages while C still works or can be adapted fairly easily, which makes it a good choice for core functionality if not everything.

IMO, C fails at being sufficiently low level to do this as well as it could. I don't think C will ever be replaced by a higher level language; it will be replaced with a lower level language that is better at incorporating static analysis of particular usage patterns into the language. To put it another way: C makes you do a bunch of "extra" work compared to other languages without really helping you do that work; other popular languages I know of try to not make you do that "extra" work, which is often a good thing but not always. A true replacement for C will need to still make you do all the "extra" work but help you make sure that work is correct.

E.g.memory allocation: it is not that manual memory allocation and deallocation is necessarily unreliable, but that there is no single way to make it reliable that works well for all programs. But there is also no way to do automatic memory managment that works well for all programs.

I would never recommend C++, but my sense is that current popularity of C++ might be connected to templates which are flexible and powerful in a different way than C or any other language I know of.

(also C was designed for "minicomputers" not mainframes, so not really all that different from its modern usage)

> c99's anonymous data allocation

I googled that phrase, and came up empty handed. Could you give an example?

Sorry, I always forget what they are officially called, which is compound literals. You can do ((struct foo){...}). GCC had a different syntax for this before c99 but the only standard way to do such allocations was in a new block, which doesn't help for a function argument.

Edit: also worth mentioning that some of the early implementations of this were very inefficient so you might see complaints from that time, but this is not an issue with modern compilers.

Thanks for pointing this out. I also tried and failed to find "anonymous data allocations". Although (having just read about them) I fail to see how I'd use your proposed string structure to retro-fit an existing codebase using good old char* strings.
The basic idea is to wrap all use of arrays with wrapper functions or macros that take the structure with length and have those wrappers access the array after performing bounds checking. So you would want to never pass a raw array as a function argument, but before compound literals there was fundamentally no good way to change a string literal to such a structure in a function argument that wouldn't violate other basic programming principles. There are still a variety of implementation choices and it is by no means an easy retrofit, but the result can be entirely reasonable. Interacting with 3rd party and standard library code can be awkward and depends on implementation choices (e.g. '\0' termination or not and the related how much of libc do you rewrite). There have been a number of string libraries over the years making various implementation choices, but no general implementation has become all that popular and it seems like most implementations are project specific.