Hacker News new | ask | show | jobs
by lucian1900 4950 days ago
Perhaps using safer languages (and languages with better error reporting) would be a solution to these kinds of problems.
8 comments

Much of the article is dedicated to hardware level memory problems. Most of the 'safer' languages I know about would not really help you in those situations. Maybe there are some odd mainframe languages that read and write everything twice or something like that, but most 'safer' languages I know about are more oriented towards preventing programmer errors. What antirez is saying that, yes, they get some of those, but many others, due to how widely used the system is, that are genuine memory failures, that cost a lot of time to track down.
Not really. We get all the same sorts of errors in our very high level C#/Asp.Net/VMware deployments and it's a shit load harder to debug with all the extra baggage that a VM and hypervisor throw on top as well...

A better solution to all the reliability problems is better quality hardware i.e. not X86. X86 has very few reliability features built in past ECC. If you look at UltraSparc based machines, they can predict failures and offline chunks of the hardware (CPUs, RAM regions, IO devices) so they can be replaced without disrupting the system.

Prevention is better than debugging :)

I was thinking more of something like Rust, where only a small subset of your code would be poking at memory manually. Then when you do get a mysterious crash, you only need to look at the unsafe portions of your application (or perhaps at the compiler).
Maybe this was just the bias of the post, but it sounded from the post that most time is spent ruling out actual memory errors. Programming in a higher-level language doesn't make such errors less likely or easier to identify.

It's true that there are certain classes of errors that "safe" languages make less likely or impossible. I'm not convinced there are enough fewer of these to make up for the additional classes of errors introduced by such languages.

What additional classes of errors does e.g. Rust introduce?
I assume he's talking about errors in the runtime itself.

For example, you could write a secure Java app (let's assume this is possible), but if the JVM has a security flaw (and it did), it doesn't really matter, because the runtime is mandatory.

Over the years, I've become more and more distrustful of large, complicated systems designed "for my safety", and migrated towards tiny systems + formal verification/model checking/exhaustive symbolic testing.

100% certified and bug free (in the sense of "doesn't implement the spec") is where I think at least the safety and security critical part of the industry will be in 30 years. The tools available today are incredibly powerful, but systems designed over 5-10 years ago are unable to take advantage of them. There's no reason programs cannot be bug free with respect to their specifications today, given the tools we have at our disposal.

Your point is definitely valid. But Rust in particular does not have a large, complicated runtime. It's pretty small (it's just thread support, basically), and it's going to get smaller over time.

It would be great to formally verify the runtime and task infrastructure. In fact people have started to create (small, incomplete) Promela models of the message passing infrastructure. However, it's lower priority than figuring out what works from a pragmatic standpoint and implementing it at the moment.

Rust isn't production ready.
I never claimed it was. I'm just thinking that perhaps the approach it (and Haskell, for that matter) takes is better.
I think you're falling in to the "silver bullet" trap.

http://en.wikipedia.org/wiki/No_Silver_Bullet

Basically, making reliable software is hard. Changing the language doesn't bring anything. There are a lot of tools to make sure your C/C++ programs doesn't have obvious errors. The problem are non-obvious errors, and these errors exist in all the languages, with different forms.

Another way to put it: "You cannot reduce risk, you can only replace it with another".

I agree that nothing solves all possible problems. But it's a huge jump from "there is no silver bullet" to "changing the language doesn't bring ANYTHING". Likely, it is quite a bit more subtle in practice. Preventing "obvious" errors (which really aren't always that obvious) that safe languages could minimize is already a great start, as many systems are rife with them.
> You cannot reduce risk, you can only replace it with another".

I don't understand why you would say this. Is it not the case that using a language with automatic memory management (say, Python) is less risky than using one with manual memory manamgement (say, C or C++)?

There are tradeoffs (e.g. performance, having less control over various things), but they are not tradeoffs between one risk and another risk, they are tradeoffs between risk and something else.

This is obviously not true. Language matters. Are you as productive in BrainFuck as in C++? I don't think so. C/C++ isn't the end of all progress in language design. Modern languages eliminate or vastly reduce whole classes of concerns like memory management, wild pointers, null pointers, incomprehensible template errors, etc. Some of these problems can be patched up for perhaps 30% by a post hoc static analysis, but the effectiveness of that is much less than a language that has been designed to make these errors impossible by construction. Modern languages also make lots of stuff easier through a good standard library, language features like closures & type inference, a REPL, a good IDE (the feasibility of which is also a function of language design), etc.

Sure, they don't solve the problem of magically and instantly conjuring up the program you want; nobody is claiming that. All these little things add up nevertheless.

I think what you're missing is that antirez has gotten his C codebase perfect enough that hardware failures are the cause of most of his crash reports.

See also: http://blogs.msdn.com/b/oldnewthing/archive/2005/04/12/40756...

Um, no. C is a perfectly valid language and some of the best, most robust systems in the world are written in it (Linux, Git, etc.) Some languages are even built to run atop C (Cython.) Even the JVM deals with pointers, memory allocation issues, and such so you don't have to but it's still there!

So using a higher level or "safer" language isn't going to stop these kinds of problems.

Sorry if I'm breaking your bubble, but Linux and git are not "the best and most robust systems in the world". If they were, the state of the art of safe and reliable software systems would be quite pitiful.

edit: that doesn't detract your point however that C is used nowadays on "robust systems"... in terms of popular robust kernels though you'll want to look at something like L4 or QNX Neutrino. There's a kernel that is actually formally verified (seL4) that was first written in Haskell, then verified, then translated to C (for speed).

Because it's super widely deployed and has a very mature development process, I would expect the Linux kernel to be among the most robust software in the world.

I am interested to hear what you think is more robust than Linux, setting aside seL4. Do you think QNX Neutrino is more robust? If so, why? And what else?

I would expect vxWorks and other RTOSs to generally be less robust than Linux, despite typically going through various certifications.

The major source of errors in a kernel is device drivers, and Linux typically is running many more drivers than the embedded kernels you mentioned. Look at any Linux point release: the majority of churn is in driver code, to fix bugs.

Thus, it stands to reason, Linux is likely less stable than an embedded kernel without all that driver code.

I run a pre-emptive embedded kernel (QK) that's extremely tiny and, in fact, was validated by myself with KLEE (a symbolic checker) to exhaustively verify correctness. I'm certain it's more reliable than Linux, which carries no such guarantee (and is orders of magnitude larger -- even excluding driver code).

Bug rates correlate extremely close with lines of code. All else being equal, a large system has more bugs, simply because it has more opportunity for them.

If you truly care about correctness, doing formal verification, model checking, etc. is the way to go, not "lots of people use it so it must be stable".

To benefit from formal verification, you have to design your code around that, and most systems today are not. It's hard to retrofit verification on top of a legacy codebase like Linux.

I don't really understand your driver argument, because when deploying Linux on an embedded system, you would only use the drivers you need... so you'd have the same amount of driver code as with any other kernel.

You say that you managed to "verify correctness" on QK, but clearly, that's not an accurate statement, since AFAIK seL4 is the only kernel that has ever been "proven correct" (and even for seL4, there are some gotchas there, AFAIK).

If you truly care about correctness, doing formal verification, model checking, etc. ...

The common perception is that model checking and formal verification are still just research areas that can't be used practically beyond toy problems. Again, seL4 is the only kernel I know of that has been "proven correct," and that has taken an insane amount of manpower that is not scalable to anything more complex than seL4. That seems to be evidence that there is something to the "common perception" I stated above.

lots of people use it so it must be stable

That's not really my argument. With Linux, there is an insane amount of testing going on all the time (I mean "informal" testing.. just people using it and reporting problems if they encounter any.. though there are also farms set up that test Linux, as well). Linux must be by far the most highly-tested software in history. On the other hand, you could use formal methods on some small kernel and maybe get some help (but again, far short of proving there are no bugs, AFAIK), but the level of testing will be many orders of magnitude less. Clearly, which one wins out depends on how good the actual state of the art in formal methods is, but my impression is that many orders of magnitude more testing will win out, unless the problem you are solving is very, very small.

The entire QK[1] kernel, including all library code, is less that two thousand lines of C (and most of the library code isn't even used by the kernel itself, which is just 234 LOC). QK is a single-processor kernel (although still pre-emptive) and very well written and tested (in, literally, hundreds of millions of devices).

It was not difficult at all to run QK and it's supporting code through KLEE and exhaustively verify the properties of each function, thanks to a super-simple design and the many included assertions, preconditions, and postconditions, which KLEE helpfully proves are satisfied automatically. If I wanted a certified optimizing C compiler, I'd use CompCert[2] to compile QK, which would give me a certified kernel all the way to machine code. (I actually use Clang.)

I am familiar with the verified L4 kernel, and it is far more complex than the QK kernel I have verified. That people have already verified a kernel far more complex should be sufficient proof that a much simpler kernel can also have its correctness verified.

Stepping back, it's 2012: people should no longer be surprised when a small-but-meaningful codebase is certified correct. It's common enough now that you don't get published in a journal just because you did it.

[1] http://www.state-machine.com/qp/index.php

[2] http://compcert.inria.fr/

So I think it'd be fair to say that Linux does ok in the robustness / lines of code or robustness / devices and real world interfaces, however, in terms of absolute numbers, a smaller system is almost always going to do better.
I was going to mention safety certification[1], but if you're saying we should disregard those, then I'm not sure what you're looking for. QNX was built with real-time and safety-critical constraints in mind. Linux was not. I'm not saying QNX is generally better than linux... just that if I want to browse the web at home I'd use Linux, and if I want a kernel to control the car I drive I'd choose QNX.

[1] http://www.qnx.com/products/neutrino-rtos/safe-kernel.html

AFAIK, the kind of safety certificaitons applied to RTOSs are all about development process, and have nothing to do with actually proving correctness in a "mathematical" sense. I know very little about IEC 61508, but for DO-178B/C, it is all about development process. No certification agency wants to claim that software is actually correct, because they can't actually show that, and then they'd be blamed if something bad happened.

I have talked to people in the preempt_rt Linux community that believe Linux will come to dominate the RTOS market just like it has a lot of other stuff, and I think they have a compelling argument. Once preempt_rt is mature enough, it's hard for me to see any reason for going with something like QNX.

> Once preempt_rt is mature enough, it's hard for me to see any reason for going with something like QNX

To correct myself: except that the Linux development process is not compatible with current cerfitication standards. Despite that, it's possible that the Linux dev process is more mature and better "tracked", and just better, than what the certs actually require, so it may eventually be possible to certify it somehow. (Anyone with thoughts on this, please pipe up...) I get the impression that some preempt_rt people have looked into this.

I don't even remember the last time I saw a Linux server crash... at least I can't think of any occasion in the last 10 years or so that wasn't directly related to some hardware failure (firmware bug or actual dead hardware).

So, I don't understand what you mean by Linux not being safe and reliable...

I'm not saying it is not reliable, just that it is not robust relative to safety-critical software. It can be reliably used as a server without a problem, but whether or not it can be used in a manned spacecraft, for example, is another question of robustness entirely.
That has little to do with robustness. Linux may very well be as robust or even more robust than any OS considered for those purposes, but the question isn't robustness, it's predictability.

I even go further and say that Linux is more robust than many of these other OSes, since it is exposed to a wider range of environments and hardware combinations of varying quality and whatever bugs they trigger. If you reduce Linux to the kind of footprint that, for instance, VxWorks or QNX have, I don't believe it would be any less reliable.

But Linux isn't predictable. Many safety-critical systems only rely on the system not crashing or misbehaving, but many others rely on real-time characteristics. Those Linux doesn't have.

Putting quotes around "safer" doesn't make it any harder to make mistakes in C.

The point isn't about abstracting away the machine, but about reducing the amount of code that has no safety guarantees.

I'm not sure why you are being downvoted, as it is still an open question whether or not safer languages make it easier to write safer programs. So your comment may be a valid one.
They're being downvoted because they're wrong - the stuff talked about here will affect any kind of program at the machine level.

Having said that, I wouldn't downvote. The question is sparking a clarifying conversation so it's arguably worthwhile even though the premise if wrong.

(Forgive me if I've got some technical details wrong and I'm not a C programmer, but I do feel like this is a valid question and isn't trying to tweak the noses of C programmers. C programmers please feel free to add/correct anything I'm missing.)

C has a powerful/unsafe feature in that it allows you to directly address memory and read and write data. From a high-level, there are potential problems that can happen when doing this: (1) You might overwrite data in a memory location that you were using for something else or (2) you can write data that, say, represents a string and then read it back and try to interpret it as data that represents a number.

You can easily avoid the above two problems by using a library, while still having the option to optimize the code if the need arises. Since a product like Redis needs to be very performant in both speed and memory usage, the option to optimize code can't be emphasized enough: It is a critical feature of Redis and thus C is an excellent language to use for it's development.

* I do feel like this is a valid question*

What is your actual question?

The "nested" formatting of this page has vertically separated my post from the post I was responding to. I'll edit my post and quote the question I'm aswering for clarity.

(edit: The edit link is no longer available on my other post. Here's the question I was responding to: "Perhaps using safer languages (and languages with better error reporting) would be a solution to [the kinds of problems mentioned in the article.]")

Well, not to the memory problems. But I do think modern systems languages should be better about printing out stuff in the case of segfaults, for example. Not sure if Go and Rust do this stuff.
Such languages also allow you to write code much faster which can lead to more mistakes.

Edit: huh. 2 downvotes. Why not explain why you think I'm wrong, rather than just downvoting because you don't agree?

You get downvotes because your reply contradicts itself. A safer language results in more mistakes being found at compilation time, so would result in less mistakes in the product, not more. Also I would expect having to specify annotations for your code to be verified (ie correct type specifications, pre/postconditions, contracts, carefully specifying program input/output through parsers, etc) results in slightly slower pace of development, not faster. But I suppose if you make up more than that time in debugging and troubleshooting time, or time explaing awkward bugs to annoyed customers, it's still a win.
Compilers can't detect logic errors which tend to be more common with faster development.
Unless your language is dependently typed.
If even true, that's besides the point. How did 'faster development' get into your reasoning at all?