And there's a tired security crowd watching Rust with great hope; C++ and C have created innumerable security holes at the expense of "convenience". Cryptographic libraries, codec libraries, image conversion libraries, OS kernels, sandboxes, virtual machines, browsers, (the list is endless) have all suffered glaring security holes from the lack of memory hygiene afforded by C and C++.
Any time your code takes in untrusted input, it should not be written in an unsafe language.
Exactly. Which is why I've been so critical, in Rust discussions, of the excessive use of "unsafe". The reply is usually something equivalent to "it's not unsafe the way I do it". Sometimes the claimed performance gain isn't there. I had a link yesterday to a forum post where someone was complaining that using an unsafe vector access function didn't speed up their program. Optimizer 1, programmer 0.
(Early in my career, I spent four years doing maintenance programming for a mainframe OS. Every time a machine crashed, taking a few hundred users off line for several minutes, I got a crash dump, which I had to analyze and fix. Most of the errors were pointer problems in assembly code. When Pascal came out, I thought we were past that. Then came C. I had hope for SafeMesa, but nobody outside PARC used it. I had hope for Modula I/II/III, but DEC went under. I had hope for Ada, but it was considered a complex language back then. Rust finally offers a way out of this hole. Don't fuck up this chance.)
I am still skeptical that "excessive use of unsafe" is actually a thing happening in Rust. Almost all the unsafe I see is for doing FFI (either for interfacing with a library or OS primitives). There's a bunch of it for implementing datastructures and stuff, and extremely little unsafe being used "for performance". Off the top of my head nom and regex do this in a few places, and that's about it. Grepping through my cargo cache dir seems to support my assertion; most of the crates there are FFI (vast majority is FFI) or abstractions like parking_lot/crossbeam/petgraph.
I agree that we should avoid unsafe as much as possible and be sure that unsafe blocks are justifiable (with stringent criteria on justification). I'm don't think as-is this is currently a problem in the community.
You keep making that claim without backup. Two days ago I posted links to extensive use of "unsafe" in matrix libraries. (Some of that code was clearly transliterated from C. Raw pointers all over the place.) That's entirely for performance; all that code could be safe, at some performance penalty.
I'd suggest using only safe code for whatever matrix/math library gets some traction, and then beating on the optimizer people to optimize out more checks.
I just gave you backup; I grepped my whole .cargo cache dir (both the one used by servo and my global one). You have also made your claim without backup -- you have repeatedly claimed that this is an endemic problem in Rust, with only individual crates (most of them obscure ones) to back it up, and I only usually make my claim in response to claims like yours -- the burden of proof is on you. Anyway, I do provide some more concrete data below, so this isn't something we should argue about.
Marices fall under the abstraction umbrella IMO. This is precisely what unsafe code is for. However, I totally agree that we should be fixing this in the optimizer, with some caveats. Am surprised it doesn't get optimized already, for stack-allocated matrices. I'm wary of adding overly specific optimizations, because an optimization is as unsafe as an unsafe block anyway, it just exists at a different point of the pipeline. If there's a general optimization that can make it work I'm all for it (for known-size matrices there should be I think), but if you have a specific optimization for the use case imo it's just better to use unsafe code.
The raw pointers thing is a problem, but bad crates exist. They don't get used.
I recently did start going auditing my cargo cache dir to look for bad usages of unsafe, especially looking for unchecked indexing, since your recent comments -- I wanted to be sure. This is what I have so far: https://gist.github.com/Manishearth/6a9367a7d8772e095629e821...
That's a list of only the crates containing unsafe code in my global cargo cache (this contains most, but not all, of the crates used by servo -- my servo builds use a separate cargo cache for obsolete reasons, but most of those deps make it into the global cache too whenever I work on a servo dep out of tree)
I've removed dupe crates from the list. I have around 600 total crates in my cache dir, these are just the ones containing unsafe code.
Around a 70 of these crates use unsafe for FFI. Around 30 are abstractions like crossbeam and rayon and graphs.
I was surprised at the number of crates using unchecked indexing and unchecked utf8. I suspected it would be less than 10, but it's more like 20. Still, not too bad. It's usually one or two instances of this per crate. That's quite manageable IMO. Though you may want to be stricter about this and consider those numbers to be problematic, which I understand.
I bet you're right that many of these crates can have the unchecked indexing or other unsafe code removed (or, the perf penalty is not important anyway). I probably should look into this at some point. Thanks for bringing this to my attention!
"itoa" is clearly premature optimization. That uses an old hack appropriate to machines where integer divide was really expensive, like an Arduino-class CPU. It's unlikely to help much on anything with a modern divide unit.
"httpparse", "idna", "serde-json", and "inflate" should be made 100% safe - they all take external input, are used in web-facing programs, and are classic attack vectors.
Not much use of number-crunching libraries; that reflects what you do.
I'll look at some more later. How to deal effectively with incoming UTF-8, especially bad UTF-8, may need some thinking.
There's also the tired sysadmin crowd who are tired of rebooting thousands of hosts for kernel, shell, libc, etc. patches. And tired of patching web, mail, dns, etc servers. I'm sure there are really smart C and/or C++ developers out there that never make mistakes but I've spent a large part of my career patching/upgrading really smart peoples code.
For me, safety is the killer feature in Rust. It's also exciting because it brings systems level programming to a new generation of programmers without all the risk.
I agree, but people seem to feel that their code should somehow be exempt from such advice, and so sacrifice safety for performance. This leads to today's sorry state of affairs.
The problem is that safety doesn't sell. If you're getting a new IoT heat lamp you look at the price and not the firmware's code. To your surprise, the first hacker coming along toasts your cat.
Rust may ultimately be the better solution for many or most cases, but right now SaferCPlusPlus[1] may be the more expedient solution for existing C/C++ code bases.
> Any time your code takes in untrusted input, it should not be written in an unsafe language.
Not just that, but my theory is that untrusted input should only be stored in data types specifically designed for untrusted input [2], and should undergo safety/sanity checks during conversion to more high-performance types. For example, a general rule might be that untrusted integer inputs may only be converted to (high-performance) native integers if their value is less than the square root of the max integer value.
Any time your code takes in untrusted input, it should not be written in an unsafe language.