Hacker News new | ask | show | jobs
by jerf 918 days ago
Really, the only memory unsafe languages still in use are C and C++.

If it weren't for the behemoth of legacy code we'd really have this problem more-or-less licked. Unfortunately, that behemoth is still rampaging across the landscape.

"Rewrite it in Rust" gets a bit of pushback, perhaps even justified, but at this point in time I'll take anything that just reduces that behemoth in size. The journey of a thousand miles begins with a single step, an elephant is eaten one bite at a time, etc. Rust is just one of the easier and more effective options for a legacy codebase, with the unusual advantage of being able to slip in incrementally. Almost every other language requires a true rewrite.

5 comments

> If it weren't for the behemoth of legacy code we'd really have this problem more-or-less licked. Unfortunately, that behemoth is still rampaging across the landscape.

It is not only or even mostly legacy. I'm a systems programmer (in classical sense, not "but my web service is soooo highly loaded and scalable that I will call it systems programming!") and from what I see on the job people start new projects in C and C++ all the time.

Why do they choose C/C++? Is it just what they and their colleagues already know and nobody wants to be the one to push for change? Easier integration with other C/C++ stuff?
From my experience reasons differ for C and C++ programmers.

C programmers are often more experienced people who are used to "simple" language that gets out of the way. They don't want to invest time into learning tricky language like Rust with all the intricacies of its type system, borrow checker, etc. Something simpler like Zig might work for them, but it is not on the table at the moment.

C++ programmers tend to be people who spent hundreds if not thousands of hours learning its ugly corner cases, reading Meyers and Alexandrescu books, that kind of thing. Sunken cost is immense, they whole careers are built on being "C++ experts" and they dread to abandon it and have to learn another very complex language from scratch.

And managers often don't see value in investing time into switching projects to new language. From their PoV it is more like programmers just want to play with a new toy instead of doing "real work".

Why are C and C++ considered the same, in these conversations?

C++ at least has tools to make life significantly more safe. I can write a buffer overflow in any language, and on the scale of difficulty, ASM-C-C++-Rust-Python covers my experience (from easiest to fuck up to hardest).

Yet nobody is calling for us to rewrite everything in python. Why is the line drawn at Rust? It's perfectly simple to trash memory in Rust.

Because "significantly more safe than C", while true, is also irrelevant. I want safe, not "safer than grotesquely unsafe". Unfortunately, for all the advances C++ has made, it is still in the "unsafe at any speed" class. It is difficult to escape the foundation of unsafety the entire C++ edifice rests on.

(At least, without further support. I consider "C/C++ with high quality static analysis" to be de facto distinct languages, and while I would favor something else even so, high-quality use of a high-quality static analyzer is enough to calm me down. Things have still crept through that level of care, but then, interpreters and compilers for safe languages have had safety errors in them before too.)

This is particularly true because it's just C and C++ that are memory unsafe. If we still in 1980, we could be arguing about the gradients of unsafety, but in 2023, we don't need to. Unsafety is not necessary at scale.

As for why people aren't asking to rewrite in Python, I partially answered that in my post. You can actually incrementally rewrite in Rust. You can't incrementally rewrite software in Python. There is also plenty of software that can be written in C, but simply can't be written in Python because it would be too slow. (Rewriting it in Python but oh no wait I'll just write the slow bits in C is a no-op, practically.)

As for trashing memory in Rust, by perfectly reasonable convention we generally understand that unsafe is unsafe, and that while languages can't avoid having it, having it does not necessarily make the entire rest of the language just as bad as C. I can crash Haskell with a straight-up, genuine null pointer exception with the Unsafe module in a single line of code. We do not thereby call Haskell an "unsafe" language where it is trivial to trash memory. Stock Rust is far safer than C++, to the point of being not only a qualitative change, but I'd contend, multiple such qualitative changes.

Separating safe C/C++ from everyday C/C++ is not fair, in my opinion, but I get your point: If it can be abused it will be, either by accident, inexperience, or maliciousness.

Once you separate C/C++ into safe and unsafe cateogries, and admit that Rust has unsafe uses that are "just so much harder to use", we're clearly defining a gradient:

    C/C++, safer C/C++ subset or maybe Unsafe Rust, safer Rust, ...
Sure you can use safe C++ with some effort, but the libraries you use most likely still use unsafe C/C++. For Rust, I expect that the libraries are much safer in general.
Rust is memory safe by default, with unsafety as an optional feature that you basically never need to use unless you’re writing extremely low-level code, need absolute maximum performance, or are interfacing with libraries written in other languages.

C++ is unsafe by default.

Of course it’s just as easy to write bugs in unsafe Rust as it is in C++ (actually, it’s probably even easier), but defaults matter.

This is a common conception, and I agree to a point. However, interfaces matter. At the interface to _literally any_ system call, unsafe starts to creep out. Either in the wrapper implementation, or in the interface _to_ the system, or even leaking through the wrapper to the caller.

At that point, if we have to re-wrap everything in rust to hide the unsafety of the interfaces to the system (sockets, shared mem, etc etc), then why not just write safe cpp wrappers?

Yes, people are writing memory overflows in their own code, but I'd argue 99% of the critical security bugs are actually in the unsafe interfaces. And we don't really need a new language to fix that. We just need new interfaces.

I love Rust, but using it for anything nontrivial makes the "safe" patina really fade. You're quickly writing what feels like C, with MaybeUninit<X> all over.

> At the interface to _literally any_ system call, unsafe starts to creep out. Either in the wrapper implementation, or in the interface _to_ the system, or even leaking through the wrapper to the caller.

It’s quite rare to have to make syscalls directly in Rust, just like it is in c++. Most code in any large enough system is related to the internal logic of the system, not to its interface with the outside world. And when you _do_ need to interface with the outside world, you can use a wrapper (lots of the standard library is basically wrappers around syscalls; this is true in any language). And no, in Rust unsafety doesn’t typically “leak through” interfaces, unless those interfaces are buggy.

> why not just write safe cpp wrappers?

There’s no such thing. It’s not possible to write a safe interface to c++ code in the sense that that term is used by the Rust community. In Rust, “safe interface” means: assuming there are no bugs in the underlying code, and the client code never invokes `unsafe`, using the interface cannot cause undefined behavior. This is impossible to guarantee in c++.

> I love Rust, but using it for anything nontrivial makes the "safe" patina really fade. You're quickly writing what feels like C, with MaybeUninit<X> all over.

This is not true at all in my experience. I work on Materialize, surely one of the more non-trivial Rust programs that exists. We use very little unsafe/MaybeUninit/C-like code. Do you have an example of a codebase you’re thinking of that does this?

And that's the problem, I do have to make syscalls directly quite often, and so I dislike Rust immensely. There are literally dozens of us at least, but the only people ever talking about Rust on the internet always like to drag C into the conversation for whatever reason even though they are always C++ programmers.
That's fair! If you are doing something low-level enough that the bulk of the work is interfacing directly with a C library (or with the kernel, in the case of syscalls) then C might make more sense than Rust.
OK this is reasonable. Perhaps my experience skews towards the lower-level a bit too much. And it's also reasonable I'm misusing the language given it's not my day job.

To answer your question, I'm referring to much of the networking code in socket2 / socket, which uses MaybeUninit when doing non-standard stuff like forming your own packets. (RAW)

Yep, I definitely buy that if you're doing very low-level stuff, C or C++ might be more ergonomic than Rust. But I don't think that covers most of the real-world use of C++.

I'm not too familiar with `socket2` but normally in Rust to construct a buffer with arbitrary bytes in safe code you would first zero it out and then write it. Using `MaybeUninit` there is presumably just a micro-optimization to avoid having to memset things to zero.

C++ makes it very difficult to write safe interfaces. You can't expose references, nor spans, nor variants, no shared_ptrs to things that can't be thread-safely overwritten, nor any standard library containers nor a lot of other things. And even if you only use whatever few interfaces remain safe, the interfaces you create are unsafe by default too. As a result, these unsafe interfaces are everywhere.

I'd contend that using Rust for anything nontrivial results in MaybeUninit & co being common.

I feel I have to disagree with your (implied) contention that it's feasible to write an API in C++ that, no matter what its inputs are, cannot ever exhibit undefined behaviour.

Because that's what "safe" in Rust means. No memory safety errors, no undefined behaviour.

Who operates crates.io ?
It’s owned by the Rust Foundation.
No I've seen libraries that need the user to use it by default. The wgpu library is one example. It's not even that low level. Rust stuff is a little too safe that it influences code organization and modularity as well.
People are calling for the use of languages like java or python when it is appropriate. Rust is just specifically mentioned (along with Swift, to some degree) when it comes to applications that have a couple of fundamental requirements that prevent the use of other languages. These might be requirements like no pausing for GC or the ability to run without a VM.

Rust (and Swift) are viable languages for solving most problems that people usually reach for C or C++ to solve today and both make it considerably more difficult, by default, to introduce the most common class of serious security vulnerabilities in the modern world.

I don't think C and C++ are that different. I agree that C++ gives you tools to make safer abstractions, but it still gives little tools to enforce these abstractions. For example std::shared_ptr being easy to use is a great improvement as in many cases you can just use it rather than trying to prove that you don't need it so that you don't need to bother implementing your own reference counting.

In C++ vec[999] is a buffer overflow and you can index any pointer even if it isn't supposed to be an array. There are so many easy mistakes that can be made and aren't obvious to a reviewer. Maybe with a very strong linter you can consider C++ very distinct from C, but by default I don't think it is that different.

> I can write a buffer overflow in any language

Try doing it in JavaScript? If so the Mozilla security team would appreciate a private disclosure. Of course it is possible in any non-sandboxed Turing complete language, but there is a huge difference between the default accessor of the most used container type allowing it vs needing to use functions in the `sun.misc.Unsafe` package or wrapping your code in an `unsafe` block. Making code that may cause a buffer overflow explicit is a night and day difference. It means that you can't do it via a typo in the vast majority of your code, and it will grab the attention of your reviewer very quickly. Isolating the part of the code that can cause buffer overflows to a small part greatly raises the attention that is given to those areas, and greatly reduces the chance of them occurring.

I don't think that Java or Rust prevent all buffer overflows, but I also don't think that it is possible to write C or C++ without them. Sure, it is possible to be careful and avoid most of the buffer overflows most of the time, but we and our reviewers are just human so we will never prevent all of the buffer overflows all of the time.

I don't think that this recommendation is under the impression that "memory safe languages" will prevent all buffer overflows, but the idea is that they will greatly reduce the number. In many situations, I would guess the majority of them, this is a good tradeoff.

> It's perfectly simple to trash memory in Rust.

What makes you think so? Most Rust programmers and programmers from other languages, agree that this is not possible. I might be missing something, but can you give an example of such simple methods to trash memory in Rust, asking from a curiosity standpoint?

I would assume they mean through the use of unsafe, which is true, but in practice unsafe code is less common than people that don't write Rust seem to think and tools like Miri help a lot to write unsafe that doesn't write to memory locations you weren't meant to.
Perhaps they aren't writing Rust because those are the people that need to write unsafe code. Chicken and egg. I'm sure it you forced all the C programmers to switch to Rust you would see a lot more use of unsafe.
But there are plenty of projects out there that are written in Rust and have to deal directly with hardware and syscalls. Hubris, a kernel written in Rust has 94 files referencing unsafe[1] out of 414 total .rs files[2]. This is as "bad" a ratio as you're gonna encounter in a project. There are many valid reasons one can have to not use Rust. "I need a lot of unsafe" is not really one.

[1]: https://github.com/search?q=repo%3Aoxidecomputer%2Fhubris+un...

[2]: https://github.com/search?q=repo%3Aoxidecomputer%2Fhubris++l...

I don’t think most Rust programmers agree it’s impossible at all.

There’s always unsafe. I can make a pointer to anywhere by hand and write to it. That would involve some very intentional work, but I could do it if I wanted to.

There's a difference between "chamber a round, remove the safety, aim at the foot, shoot" and "open the kitchen faucet, leg gets blown off".
Yes of course. But the GP said it isn’t possible. It is. It’s not even hard.

But I did disclaim that it had to be somewhat intentional.

> I can write a buffer overflow in any language. ... It's perfectly simple to trash memory in Rust.

Not in safe Rust.

You're more right than wrong, but I want to push back just a little. You can write a buffer overflow in safe rust if you store multiple things in the same array and work with indices rather than slices. Of course the risk is bounded by what shares an array, and it's more awkward than doing it any of several right ways. You won't write a buffer overflow in safe rust... but you can if you want to.
This is a bit like saying "you can write a buffer overflow in any turing-complete language, because you can write a C emulator, and then write the buffer overflow in C"
A bit, but in that case the buffer overflow is arguably still "in C" in a way that it isn't in my example.

As I said, you won't write a buffer overflow in rust, but unpacking why can be interesting and it doesn't end at "bounds checks".

> Why are C and C++ considered the same, in these conversations?

Conjunction is not equality. They are both memory unsafe. Then you can argue from that over how memory unsafe they are in practice (using the right practices, using the right language subset).

What I don't understand is the excitement for using Rust vs. using garbage collected languages like Golang, at least for high-level applications (performant or low-level systems applications are excepted here.) My experience is that an experienced programmer can be a lot more productive quickly with Golang, since they don't need to climb the Rust borrow-checking learning curve. Rust doesn't even free you from the need for a runtime or standard library.
> My experience is that an experienced programmer can be a lot more productive quickly with Golang, since they don't need to climb the Rust borrow-checking learning curve.

Will somebody please tell me why everyone seems obsessed with optimizing for programmers going from zero to minimally productive?

I have been using Ruby for twenty years, Rust for eight, golang for nine, and C for twenty-six. Most programmers will use a language for dramatically longer than a year, so why are the first three months such a singular point of focus?

The code I wrote in the first three months of using every one of these languages was bug-ridden, unidiomatic, unnecessarily difficult to maintain, and generally terrible. Ironically, Ruby was probably the least bad in this regard. Go and Rust were probably about the same, but I’d frankly give Rust the slight edge here. C was the inarguably the worst, but it was also my first language.

Subjectively and retroactively comparing things a year in, I’d wager my Rust was of the best quality (readability, ease of maintenance, speed of development, bugs per “unit of functionality”), followed by Go, Ruby, and then C. At five years, the quality of my Rust code blows everything else out of the water. My C was still terrible (partly because it was C, partly because it was still my first language). But I’d say Ruby edged out Go at this point for me.

Obviously this is not only anecdata but wildly guesstimated looking back and comparing learning curves on languages at completely different points in my experience as a programmer. I’ll happily admit that Rust pulling so far ahead so quickly is as likely to do with it building off the knowledge of prior decades of professional software engineering. And that my personal experience with any of these languages is of course unique to me and my circumstances.

But it just seems wild to me that people seem to focus on “getting a new person up to speed as fast as possible” to the exclusion of apparently everything else.

Conversion friction. Very important. Arguably the reason why Haskell is not 10-100 times more popular than it currently is; the conversion friction is just too much, and even if all the tooling was perfect and the libraries were perfect and the documentation was perfect it would still have too high a conversion friction to attract a community the size of Go or C# or something.
I can sympathize somewhat with this argument. But it’s also kind of circular to me.

Go being easy to pick up and learn is certainly a virtuous cycle insofar as it helps bootstrap a large community. And that’s absolutely happened!

But that is—in my mind—more of an explanation for why Go has become so popular so quickly more than it is a compelling argument for the language itself. Haskell having conversion friction might explain its lack of adoption, and that’s certainly a great argument in a discussion about why or why not to adopt it for yourself or your team! But it seems like an overvalued axis on which people seem to evaluate languages on their own.

As a counterexample: PHP classically had a reputation as being a language that was very easy for beginners to pick up. And it’s even memory safe! But it also had a reputation for having poor long-term prospects for projects written it as well as being a limiting factor in the growth of engineers using it (note: I make no claims as to the fairness of this reputation, nor to its applicability on “modern” PHP).

PHP is arguably even easier to learn than Go. So why is it that virtually nobody jumps in these discussions trumpeting that?

From the article: "In contrast, Rust's explicitness in this area not only made things simpler for us but also more correct. If you want to set a file permission code in Rust, you have to explicitly annotate the code as Unix-only. If you don't, the code won't even compile on Windows. This surfacing of complexity helps us understand what our code is doing before we ever ship our software to users."

https://vercel.com/blog/turborepo-migration-go-rust

I’m writing a sibling comment to answer the parent’s question directly rather than the meta-argument from my original reply.

> What I don't understand is the excitement for using Rust vs. using garbage collected languages like Golang… since they don't need to climb the Rust borrow-checking learning curve.

Because, in my experience, climbing that learning curve has made me a better programmer more than nearly any other change in my long career. And that benefit has extended to code in every language I write.

The borrow checker isn’t just some hurdle to get in your way; it’s trying to tell you (awkwardly at times and perhaps less helpfully than one would wish) something fundamental about the way you think about and design programs. Internalizing that lesson can bring significant benefits on designing systems with clean boundaries that are easier to test, easier to reason about, and easier to compose.

Besides that, Rust greatly assists you (through features other than the borrow checker) in building software that is correct. This means it will tell you in a much wider variety of scenarios when future code invalidates previous assumptions. This is invaluable for projects that we expect to survive for a long time since the time a project is maintained will dwarf the time it’s under active development. And it will almost certainly be maintained by someone without the full context of the original developer(s). This is true even if the maintainer is the same person who wrote it in the first place, since our mental model of a program bitrots far faster than the program itself.

In practice, this aligns with my personal experience. Go projects end up with a lot of implicit assumptions that are silently violated by future work and expose bugs. They crash on nil pointer derefs. They accrue a multitude of linting tools that usually paper over some of the language’s shortcomings, but only in common cases. And they become painful to maintain as the original developers move on to other projects, with new changes grafted haphazardly into dozens of touch points instead of cleanly in one or two places. Yes, you can “easily” follow what any particular function does, but to do so you have to parse out and mentally model every minute detail, rather than being able to reason at a high level.

>Rust doesn't even free you from the need for a runtime

I always thought that Rust was the only memory-safe language that doesn't need a runtime (beyond the libc that every language links to when running on Unix-like OSes). Maybe you could define what you mean by runtime.

> Rust doesn't even free you from the need for a runtime or standard library.

Could you elaborate on this? Rust doesn't have a runtime (beyond what C has), and am having trouble understanding what you meant to say about stdlibs.

Rust certainly performs runtime bounds-checking as well as some other tasks, so there is runtime code (even if it's just compiled into executables.) If you want features like async (standard in many language runtimes) you're also going to have to pull in some kind of external runtime dependency. And everyone doing high-level web-style development seems to drag in something like tokio.
> Rust certainly performs runtime bounds-checking as well as some other tasks, so there is runtime code (even if it's just compiled into executables.)

I don't think I've ever seen anyone reference "C with bounds checks enabled" as "having a runtime". Does having stack probes also imply having a runtime? I guess I'd be less surprised if it had been worded as "some mitigations/features have a runtime cost".

> If you want features like async (standard in many language runtimes) you're also going to have to pull in some kind of external runtime dependency.

Yes, you can add a runtime to your application (if you need to use async/await). It has an additional cost over not doing that, but the "promise" is that it is "zero (additional) cost (over what you'd end up with if you wrote the functionality by hand)".

For sure C programs have a runtime. I'm debugging an issue right now that is to do with Windows not shipping VCRUNTIME140_1.DLL on out-of-the-box or old versions of Win10, so in that case it's very clear because you can make C programs that won't start due to a missing runtime library.

The runtime isn't all that large but every OS has one. On UNIX it's spread over libc, libpthread, libgcc, libm and so on.

On Linux stack probes usually have some support code in libgcc and/or glibc, if I recall correctly.

People writing C are the people that really do have to make syscalls directly, or use weird calling conventions, or whatever all the time. I see Rust replacing C++ but I have a hard time seeing it replace C because the people that desired and/or could tolerate safety, like you said, are already not writing C for the most part. That group has been firmly C++ for a long time.
> People writing C are the people that really do have to make syscalls directly,

Not really. See for example desktop Linux (i.e. Gnome).

>Really, the only memory unsafe languages still in use are C and C++.

Ada, Fortran, assembly?

Fortran does not end up where I'm too worried about its security. C and C++ does. At the scale I'm talking about I'm not sure we'd even say Fortran is "in use".

Assembly is in use, yes, but in 2023 I feel there is generally an understanding of the risks and I haven't seen the "write everything in assembly" crew in about 15 years. The problem is that there's still too many programmers blithely using C and C++ without realizing the risks and thinking they can cowboy through the problems. For every line of vulnerable, dangerous assembly I bet there's thousands of lines of C or C++.

There is also the problem that there have been some big bugs that got through even static analysis and fuzz testing, but I'd still be at least reasonably satisfied if all the critical software in C and C++ would be supported by those tools. Interpreters and compilers have had non-zero error rates too.

At the scale I'm talking about, Ada is a non-entity as well. It isn't used. "But it is! I'm a professional Ada programmer!" says someone reading this to themselves. In which case I would say, you darned well know what I mean and don't pretend otherwise just to try to score useless internet points. Ada is not a relevant force on the programming world. That may be sad, but it's true.

SQLite is able to carry most of the justification for C itself at this point.

Duplicating the DO-178B certification that it has obtained in an endorsed language will be an incredible burden for any who attempt it.

Ferrocene has said in the past they plan on going for DO-178 in the future.
But we do not plan on rewriting SQLite, that much I can say :)

(My reading is that the GP points to the exemplary achievement that SQLite has reached a close to (security) bugfree, at what I consider a nearly superhuman effort)

That's a good point, I certainly didn't mean to imply that you were!
Objective-C?

Plus everything that needs to directly interface with the above languages. So many Python libraries that are one "funny integer" away from a nightmare debugging session.

Fortran is probably as bad as C, and Ada isn't truly memory safe - they link to Ada/Spark which is but that doesn't seem to have much widespread use.

https://www.adacore.com/about-spark

Isn't Ada/Spark in avionics the main use case for Ada these days? So, huge share of a tiny market?
> Isn't Ada/Spark in avionics the main use case for Ada these days?

Hasn't it always been? I'm no expert but always assumed that it was used basically military and avionics, and perhaps other safety critical equipment.

I think it's also used in high speed rail.
Also weapon systems.
Fortran is used for scientific computing. The crash of crashing a run on someone's Beowulf cluster is high, but the severity is low.
Fortran doesn’t even have dynamic memory allocation, so it’s inherently safe.
It actually has had this since Fortran-90, and there's even a 'pointer' keyword.
https://www.ibm.com/docs/en/xffbg/121.141?topic=attributes-a...

No pointer direct arithmetic, though. If you allocate the memory via allocate, you can inquire if it's been deallocated.

Isn't Ada memory-safe?
Ada isn't, Ada/SPARK is. That's a subset of Ada, and while it is the main draw of the language for new projects the majority of extant Ada code predates SPARK.
In summary, Ada tries to be memory safe by default -- as far as that can be done without requiring automatic memory management and garbage collection -- but deliberate use of "unchecked" language features can break memory safety.

In other words, if you go out of your way to use unsafe features, and don't use the features that compensate, Ada is memory unsafe. This has become the goto dismissal of Ada, apparently more popular than "eww...a BEGIN..END language" and "designed by committee/government tainted".

But other languages use c++. I think r for example is widely used and has a ton of packages where people write often buggy c++.
Rust is a little too safe imo. I want a rust with just shared pointers and no move semantics. I guess go would be it? But go is too opinionated with a bunch of stupid go specific philosophies like the weird error handling and the stupid packaging rules.

Go is also opinionated with concurrency. So that's an issue too.

Ocaml then?
I like functional, but this isn't what I'm referring too. OCaml by being functional is opinionated.