Hacker News new | ask | show | jobs
by rkangel 2674 days ago
It's nice to see a balanced, real world, case study including 'these things are fixed by Rust', 'these are problems that don't occur in idiomatic Rust', and 'these are problems that Rust can't help you with'.

I'm a big fan of Rust, but the one sided 'Rust makes all the problems go away' articles don't provide any value.

4 comments

It also highlights an example of a security bug introduced during rewriting; highlighting that rewriting any significantly large piece of software is bound to introduce bugs.
Not just introducing new bugs but reintroducing old bugs which were publicly documented and/or previously exploited. Which you could argue are worse as it’s a lower barrier for detection by attackers, but also on the otherhand by the team/community.

Also of note was that there was already an automated test for one of the high priority bugs that got reintroduced but the that particular tests was turned off.

What confuses me about this is the tests were turned off because they were taking too long. But wouldn't the appropriate behavior there be "run a subset of the tests normally, but run the full test suite occasionally" rather than just disabling the tests completely?
Or turn off some during development but run the whole suite before release?

I have a feeling the a bunch of the tests in that particular category needed to be updated, so it wasn't simply just too long.

Pretty clear that the QA process is at fault here, which is sadly common in the software industry.
These are all ways that Rust is neutral or better. Were there any ways that Rust was worse? For example, did the previous code use value-type templates? If so how was their absence worked around in the new code?
Chucklefish made some buzz few month ago saying they were writting there new game in Rust, they now stopped and went back to C++.

https://www.reddit.com/r/programming/comments/atyzz4/halley_...

They're still using Rust for some stuff, just not for the game itself. Wargroove's matchmaking server is still in Rust. That said, yes, this is a good example of what your parent is asking for.

https://www.reddit.com/r/programming/comments/atyzz4/halley_... has some more context too

Compile time. For my 9k lines project it takes 20 sec to compile in debug mode and 1 minute in release. Wich is insane because I have to do a lot of runs to test things(where cargo check can't help).
In spite of this i'm in love with Rust
If you modify a single line and rebuild, how much time does it takes, on average ?
I tried to change some constants in random places(in 1 line each time) and on average it's ~10 seconds Also compilation unit in rust is a crate. I guess if your app can be separated in different crates it can ease the pain.
cargo check is 3.11s
Overall, it did a decent job of being balanced, but I don’t buy the memory overflow example at all.

For one thing, idiomatic C++ bounds checks by default. You need to use at(). If you don’t like typing at(), you can implement an array type that always bounds checks fairly easily. On that note, the vulnerable c++ code should be using accessors, not indexing to access the oddly packed and laid out array. Even the fixed version wouldn’t pass a code review from me. You could write equivalently bad code in any language that supports array types, and get similarly broken results.

For another thing, there’s no evidence that you couldn’t achieve the same improved data structures in C++ using its type system (which is turing complete...)

The “thread safe by default” property sounds interesting; I’d be interested in reading more about that.

> For one thing, idiomatic C++ bounds checks by default. You need to use at().

Sounds like it doesn't check by default then. It checks if you remember to check using the more verbose bounds-checking method. Not unlike the issues with subscripting std::map.

>Sounds like it doesn't check by default then.

Yeah, in the phrase "idiomatic C++ bounds checks by default", the term "idiomatic" is a "weasel word" that disqualifies "by default".

I have some experience in C++, and I am familiar enough with the standard library to remember that operator[] doesn't check bounds while the at member function does. I would assume that the Firefox C++ programmers know this as well. However, maybe I'm wrong or have too much faith?

Or, maybe, the programmer was aware that operator[] didn't perform bounds checking, but opted to use it for some reason? A good way to dissuade people from making unidiomatic choices is to make them more verbose. IMO calling the at function isn't particularly verbose, but if the member function that didn't check bounds were called something like "at_unchecked," perhaps people would be less inclined to use it.

Also, from the snippet in the blog post, note that you can't tell whether the Firefox code used std::vector, C-style arrays, or some non-STL container type. Projects may use their own container types, but your criticism only applies if the programmers were using the C++ standard library.

> I have some experience in C++, and I am familiar enough with the standard library to remember that operator[] doesn't check bounds while the at member function does.

Everybody knows you're supposed to check pointers for being null, and yet time and time again developers fail.

As long as you rely on human nature and provide one API which is simple, convenient, obvious and dangerous and one which is complex, inconvenient, non-obvious and safe, you will just drive users towards the former.

> I would assume that the Firefox C++ programmers know this as well. However, maybe I'm wrong or have too much faith?

Just because they know when quizzed doesn't mean they'll always remember when actually doing. Even less so when subscripting is safe in pretty much every other language which provides array subscripting, and ::at… only exists in C++?

> IMO calling the at function isn't particularly verbose

No, but it's still more verbose and less intuitive than [], especially given the above (that tons of languages use [], and very few have an at method)

> A good way to dissuade people from making unidiomatic choices is to make them more verbose.

Indeed.

this is so true

and has been my experience with unwrap

they dont want people to use it but the alternative is so verbose and clunky

The alternative tends to be to propagate the error upwards using the `?` operator, up to some point where it makes sense to handle errors
The relevant code in fact uses a non-STL array container. Unfortunately, the performance of the STL containers is fairly unreliable across C++ standard library implementations and can be very poor in some cases. That makes them harder than it should be to use in code where you need to understand the performance characteristics of your data structures.
Most debug builds across major C++ compilers do support it though.
It is implementation dependent.

operator[]() does not require bounds checking by ISO C++, however most compilers do actually enable bounds checking in debug builds.

Visual C++ certainly does it for example.

Huh. According to https://en.cppreference.com/w/cpp/container/vector/operator_..., “no bounds checking is performed.” I find cppreference.com generally trustworthy, but maybe it’s wrong here? Or, maybe “no bounds checking” actually means “no guaranteed bounds checking?”
It means “no guaranteed bounds checking”, the standard only requires at() to throw if out of bounds (§ 26.2.4.1, note 15), but leaves unspecified how operator[]() should behave in invalid accesses, only that it isn't allowed to throw.

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n471...

Here is the Visual C++ documentation for bounds checking in debug builds.

https://docs.microsoft.com/en-us/cpp/standard-library/checke...

You need to look at the text of the Standard itself when it comes to this level of language lawyering. The short answer is that operator[] has undefined behavior if the index is out of range. Performing a check and terminating the program with some kind of runtime error is a legal subset of "undefined behavior", and so it's commonly done in debug builds, but you cannot rely on it in any sense.
This is like saying the Rust doesn't have memory safety because you can use "unsafe". C++ offers you a safe Api and an unsafe API if you want to go that way. If you decide to use the unsafe api the consequences are on you, the same way that it is on you if you decide to use "unsafe" in Rust.
The difference is in the defaults. In Rust, the easiest and more common thing ([]) is checked, and the more verbose and uncommon thing (get_unchecked) is unchecked. In C++, it's the reverse.
> This is like saying the Rust doesn't have memory safety because you can use "unsafe".

It's the exact opposite which is the point. Rust requires extra work and using the non-default and less convenient way for unsafety, C++ requires extra work and using the non-default method and less convenient way for safety.

> You could write equivalently bad code in any language that supports array types, and get similarly broken results.

You wouldn't get "similarly broken results." The results for doing this in C/C++ are far more serious, which was a point the article made.

If you do this in Rust/C#/Java/etc they will safely crash. If you do this in C/C++ it is undefined behavior, it may crash, but it also could allow remote code execution.

The Rust version of this bug is Security-Low (crash), the C++ version is Security-Critical (potential RCE).

If you want to prevent this class a bugs, you don’t need to switch languages is the point. Run a linter that prevents that syntax.
Right, but all you have to do in C++ is switch the array implementation to bounds check by default. This is not rocket science, and is certainly easier than rewriting large code bases from scratch.

Similarly, I could complain that rust arrays are too slow, and produce an array implementation that uses unsafe under the covers.

Dropbox did this in their Rust code: https://github.com/dropbox/rust-brotli-decompressor/blob/mas...

https://github.com/dropbox/rust-brotli-decompressor/blob/mas...

Toggle the feature, remove the bounds checks unconditionally.

> The “thread safe by default” property sounds interesting; I’d be interested in reading more about that.

This property is simply a corollary to Rust's ownership invariants enforced by the borrow checker. There can only be a single live mutable reference to any object, therefore two threads can never hold mutable references to the same object simultaneously. Similarly, all objects must have a lifetime at least as long as the objects which reference them. Therefore no thread can ever hold a stale reference (mutable or immutable) to an object. If code passes Rust's borrow checker it must necessarily be thread-safe.

The borrow checker doesn't need any special knowledge of threading, though AFAIU there are traits that permit the compiler to check that you're using the correct boxing type when passing objects to threads. Objects which implement these and other traits are responsible for maintaining ownership invariants using unsafe code.

TL;DR: Threads cannot share mutable references because no code can share mutable references in Rust. It follows that any Rust code is thread-safe, ignoring bugs within or induced by unsafe code.

This is also the normative approach to writing "thread-safe" code in Unix programming more generally. Most libraries that grew from the Unix culture are only written to be re-entrant--they never hold references to objects shared outside their encapsulation or execution scope. Therefore most such libraries claim thread-safety provided that callers maintain the same re-entrancy invariants for library-defined objects. Without needing to use mutexes such code is thread-safe, you just don't normally get compiler verification. Contrast that with Windows programming or, especially, Java, where the expectation is that objects are shareable and guarantee thread-safety internally. Depending on which programming culture you grew up in, Rust's thread-safety is either obvious ("oh, it's just enforcing re-entrant-safe APIs") or magical ("how does it insert locks in all the right places?").

Caveat lector: I've never written any Rust code.

> Contrast that with Windows programming or, especially, Java, where the expectation is that objects are shareable and guarantee thread-safety internally.

That's not an expectation on either. For example, the single most common phrase you can see on MSDN in class docs is: "Public static members of this type are thread safe. Any instance members are not guaranteed to be thread safe."

There was a period when it was different for some things. In particular, both Java and .NET had thread-safe standard collection classes initially - e.g. Vector in Java, ArrayList in .NET. This has proven to be a bad trade-off in both cases - it's a massive perf hit for something that's not even all that useful even to threaded code, because in practice you often need to perform multiple operations on the collection atomically, and then you still need your own lock.

So they have since been obsoleted by new collections that do not attempt to do any thread synchronization; the old collections remain for backwards compatibility purposes. Modern idiomatic Java or C# code doesn't do any kind of synchronization or locking to protect the caller, unless that is specifically the purpose of this class or function to provide such things.

So I wouldn't say there's significant cultural differences between Windows, Unix and Java in that regard.

> You need to use at()

That's not "by default". std::array_view still hasn't landed so I can't even wrap ptr+size s provided by third party libraries or across standardized C ABIs without going beyond the standard library (really not by default now)... and I believe I have yet to see a single solitary use of at() in a production C++ codebase. Some "default".

For code where overflows are expected, C++ exceptions are way too heavyweight and some other kind of "attempt dereferencing" pattern is used (if only checking the index manually before invoking operator[]).

For code where overflows are unexpected, uncatchable assert-style checks are used.

I've heard these "idiomatic C++" and "improved data structures" arguments a few times and as a C++ developer, I'm a bit skeptical. Does anybody know of any non-toy C++ projects that actually demonstrate such a high-level of reliable use of C++?

If so, how much do they rely on the developers being ideal programmers, who know intimately the intricacies of C++ and how to avoid ending up in UB-land?

I have a codebase that amounts to several kLOC of C++, available here: https://git.sr.ht/~maelkum/viuavm While I openly admit that parts of it are shoddily written, I also try to always use the "reliable subset" of C++.

From my experience it is not hard to make it a habit (e.g. using `::at()` and declaring variables as `auto const x = ...;` is muscle memory at this point) but the code becomes very verbose and looks like programmed defensively to a sometimes ridiculous extent.

So I understand your skepticism. To address the point of relying on developers being "ideal": just turn the compiler warning flags up to eleven, make all warnings errors, and run your tests two times - one time with sanitisers, and the other under Valgrind. This won't catch all errors, sure, but will still make you more confident in your code's reliability.

I'm pretty sure OP meant to write "idiomatic C++ does not bounds check". That's certainly true - the only place I've ever seen at() in use is in textbooks, never in production code. It doesn't really make much sense, because the moment you start using iterators, it's all unchecked anyway, so why bother with at() specifically?
Yeah I keep hearing / reading about rust, even seen some demos but the demos all end with "oh no I'm not using this for anything". Still cool but ... want to see someone doing something in production / get their thoughts on that.

Edit: To be clear I'm not saying anyone isn't using it, this is just more of a comment about the impression I can get when I hear about X tech is so cool, but that's most of what I hear and at some point I want to see those same articles about real world use / experiences. I'm aware Mozilla and others are using it.

>Yeah I keep hearing / reading about rust, even seen some demos but the demos all end with "oh no I'm not using this for anything". Still cool but ... want to see someone doing something in production / get their thoughts on that.

Huh? Besides Mozilla itself using it in the browser in several backends, there are tons of places where its used in production (and several articles on HN on the topic).

Where do you see all those demos who say "I'm not using this for anything"?

I've seen articles on Mozilla using Rust, Dropbox using Rust, MS using Rust (rustgrep used in VSC but also something in Azure IIRC), Google using Rust (Google Fuchsia), Facebook using Rust (Mononoke, etc), and so on.

Amazon is using Rust - Firecracker powers Lambda and Fargate: https://aws.amazon.com/blogs/aws/firecracker-lightweight-vir...
NPM is using Rust on the backend
I believe Discord is using Rust for their games store.
MS security devision is also proposing a trio of C#, Rust and constrained C++ (meaning no Cisms) for their software going forward.
I'm using it for an not-yet released product. Roughly 30k lines of code running on Windows, macOS, and Linux. Mostly high volume event processing and parsing.

Been a lot of fun to work with. Having spent 10 years doing C++ professionally before, I can't see myself ever looking back.

Once product is public later this year, I may see if I can talk more publicly about it.

That sounds like it would be very interesting.
We're doing a low level core functionality lib in Rust that is shared between Windows desktop, iOS and Android apps.

It works really well.

If we were to do it again, the core Rust lib would do much more and the platform native code would only be things that need to be like UI and notifications.

Dropbox is using Rust to decompress Brotli: https://blogs.dropbox.com/tech/2016/06/lossless-compression-...
Thank you.
Our product is basically written in Rust (~40k LoC) [1]. When I say basically: there are a few backend components that are written in Java (i.e, ActiveMQ) and the frontend is TypeScript/React, but it is mostly Rust.

[1]: https://www.schoolbench.com.au/

Firefox?
I'm not saying nobody is using it, just the volume of "hey this is so cool" gets to be a bit much and I want to see more "here it is solving problems in production for a real thing".
The explorations and small little demos and projects in a language is almost always higher than the number of production uses, it's just that in popular languages nobody cares to read much about it (or at least share is so it spreads), because it's already known. Rust is still fairly uncommon, so you're bound to see a lot of people announcing their experiences while playing around with it, and that can skew the perceived ratio of things.

I think the problem is solved by paying less attention to the stuff you care about less, and/or seeking out more production(y) uses if desired by looking in places where it's already common, such as /r/rust, where the content ratio will naturally fall more favorably towards production uses, libraries and advanced discussions since a minimum level of experience (or at least interest) can be assumed.