Hacker News new | ask | show | jobs
by hedora 2674 days ago
Overall, it did a decent job of being balanced, but I don’t buy the memory overflow example at all.

For one thing, idiomatic C++ bounds checks by default. You need to use at(). If you don’t like typing at(), you can implement an array type that always bounds checks fairly easily. On that note, the vulnerable c++ code should be using accessors, not indexing to access the oddly packed and laid out array. Even the fixed version wouldn’t pass a code review from me. You could write equivalently bad code in any language that supports array types, and get similarly broken results.

For another thing, there’s no evidence that you couldn’t achieve the same improved data structures in C++ using its type system (which is turing complete...)

The “thread safe by default” property sounds interesting; I’d be interested in reading more about that.

6 comments

> For one thing, idiomatic C++ bounds checks by default. You need to use at().

Sounds like it doesn't check by default then. It checks if you remember to check using the more verbose bounds-checking method. Not unlike the issues with subscripting std::map.

>Sounds like it doesn't check by default then.

Yeah, in the phrase "idiomatic C++ bounds checks by default", the term "idiomatic" is a "weasel word" that disqualifies "by default".

I have some experience in C++, and I am familiar enough with the standard library to remember that operator[] doesn't check bounds while the at member function does. I would assume that the Firefox C++ programmers know this as well. However, maybe I'm wrong or have too much faith?

Or, maybe, the programmer was aware that operator[] didn't perform bounds checking, but opted to use it for some reason? A good way to dissuade people from making unidiomatic choices is to make them more verbose. IMO calling the at function isn't particularly verbose, but if the member function that didn't check bounds were called something like "at_unchecked," perhaps people would be less inclined to use it.

Also, from the snippet in the blog post, note that you can't tell whether the Firefox code used std::vector, C-style arrays, or some non-STL container type. Projects may use their own container types, but your criticism only applies if the programmers were using the C++ standard library.

> I have some experience in C++, and I am familiar enough with the standard library to remember that operator[] doesn't check bounds while the at member function does.

Everybody knows you're supposed to check pointers for being null, and yet time and time again developers fail.

As long as you rely on human nature and provide one API which is simple, convenient, obvious and dangerous and one which is complex, inconvenient, non-obvious and safe, you will just drive users towards the former.

> I would assume that the Firefox C++ programmers know this as well. However, maybe I'm wrong or have too much faith?

Just because they know when quizzed doesn't mean they'll always remember when actually doing. Even less so when subscripting is safe in pretty much every other language which provides array subscripting, and ::at… only exists in C++?

> IMO calling the at function isn't particularly verbose

No, but it's still more verbose and less intuitive than [], especially given the above (that tons of languages use [], and very few have an at method)

> A good way to dissuade people from making unidiomatic choices is to make them more verbose.

Indeed.

this is so true

and has been my experience with unwrap

they dont want people to use it but the alternative is so verbose and clunky

The alternative tends to be to propagate the error upwards using the `?` operator, up to some point where it makes sense to handle errors
Aka exceptions.

Yes, chaps, that Result<T,E> type is all but isomorphic with checked exceptions, Java-style.

The relevant code in fact uses a non-STL array container. Unfortunately, the performance of the STL containers is fairly unreliable across C++ standard library implementations and can be very poor in some cases. That makes them harder than it should be to use in code where you need to understand the performance characteristics of your data structures.
Most debug builds across major C++ compilers do support it though.
It is implementation dependent.

operator[]() does not require bounds checking by ISO C++, however most compilers do actually enable bounds checking in debug builds.

Visual C++ certainly does it for example.

Huh. According to https://en.cppreference.com/w/cpp/container/vector/operator_..., “no bounds checking is performed.” I find cppreference.com generally trustworthy, but maybe it’s wrong here? Or, maybe “no bounds checking” actually means “no guaranteed bounds checking?”
It means “no guaranteed bounds checking”, the standard only requires at() to throw if out of bounds (§ 26.2.4.1, note 15), but leaves unspecified how operator[]() should behave in invalid accesses, only that it isn't allowed to throw.

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n471...

Here is the Visual C++ documentation for bounds checking in debug builds.

https://docs.microsoft.com/en-us/cpp/standard-library/checke...

You need to look at the text of the Standard itself when it comes to this level of language lawyering. The short answer is that operator[] has undefined behavior if the index is out of range. Performing a check and terminating the program with some kind of runtime error is a legal subset of "undefined behavior", and so it's commonly done in debug builds, but you cannot rely on it in any sense.
This is like saying the Rust doesn't have memory safety because you can use "unsafe". C++ offers you a safe Api and an unsafe API if you want to go that way. If you decide to use the unsafe api the consequences are on you, the same way that it is on you if you decide to use "unsafe" in Rust.
The difference is in the defaults. In Rust, the easiest and more common thing ([]) is checked, and the more verbose and uncommon thing (get_unchecked) is unchecked. In C++, it's the reverse.
> This is like saying the Rust doesn't have memory safety because you can use "unsafe".

It's the exact opposite which is the point. Rust requires extra work and using the non-default and less convenient way for unsafety, C++ requires extra work and using the non-default method and less convenient way for safety.

> You could write equivalently bad code in any language that supports array types, and get similarly broken results.

You wouldn't get "similarly broken results." The results for doing this in C/C++ are far more serious, which was a point the article made.

If you do this in Rust/C#/Java/etc they will safely crash. If you do this in C/C++ it is undefined behavior, it may crash, but it also could allow remote code execution.

The Rust version of this bug is Security-Low (crash), the C++ version is Security-Critical (potential RCE).

If you want to prevent this class a bugs, you don’t need to switch languages is the point. Run a linter that prevents that syntax.
Right, but all you have to do in C++ is switch the array implementation to bounds check by default. This is not rocket science, and is certainly easier than rewriting large code bases from scratch.

Similarly, I could complain that rust arrays are too slow, and produce an array implementation that uses unsafe under the covers.

Dropbox did this in their Rust code: https://github.com/dropbox/rust-brotli-decompressor/blob/mas...

https://github.com/dropbox/rust-brotli-decompressor/blob/mas...

Toggle the feature, remove the bounds checks unconditionally.

> The “thread safe by default” property sounds interesting; I’d be interested in reading more about that.

This property is simply a corollary to Rust's ownership invariants enforced by the borrow checker. There can only be a single live mutable reference to any object, therefore two threads can never hold mutable references to the same object simultaneously. Similarly, all objects must have a lifetime at least as long as the objects which reference them. Therefore no thread can ever hold a stale reference (mutable or immutable) to an object. If code passes Rust's borrow checker it must necessarily be thread-safe.

The borrow checker doesn't need any special knowledge of threading, though AFAIU there are traits that permit the compiler to check that you're using the correct boxing type when passing objects to threads. Objects which implement these and other traits are responsible for maintaining ownership invariants using unsafe code.

TL;DR: Threads cannot share mutable references because no code can share mutable references in Rust. It follows that any Rust code is thread-safe, ignoring bugs within or induced by unsafe code.

This is also the normative approach to writing "thread-safe" code in Unix programming more generally. Most libraries that grew from the Unix culture are only written to be re-entrant--they never hold references to objects shared outside their encapsulation or execution scope. Therefore most such libraries claim thread-safety provided that callers maintain the same re-entrancy invariants for library-defined objects. Without needing to use mutexes such code is thread-safe, you just don't normally get compiler verification. Contrast that with Windows programming or, especially, Java, where the expectation is that objects are shareable and guarantee thread-safety internally. Depending on which programming culture you grew up in, Rust's thread-safety is either obvious ("oh, it's just enforcing re-entrant-safe APIs") or magical ("how does it insert locks in all the right places?").

Caveat lector: I've never written any Rust code.

> Contrast that with Windows programming or, especially, Java, where the expectation is that objects are shareable and guarantee thread-safety internally.

That's not an expectation on either. For example, the single most common phrase you can see on MSDN in class docs is: "Public static members of this type are thread safe. Any instance members are not guaranteed to be thread safe."

There was a period when it was different for some things. In particular, both Java and .NET had thread-safe standard collection classes initially - e.g. Vector in Java, ArrayList in .NET. This has proven to be a bad trade-off in both cases - it's a massive perf hit for something that's not even all that useful even to threaded code, because in practice you often need to perform multiple operations on the collection atomically, and then you still need your own lock.

So they have since been obsoleted by new collections that do not attempt to do any thread synchronization; the old collections remain for backwards compatibility purposes. Modern idiomatic Java or C# code doesn't do any kind of synchronization or locking to protect the caller, unless that is specifically the purpose of this class or function to provide such things.

So I wouldn't say there's significant cultural differences between Windows, Unix and Java in that regard.

> You need to use at()

That's not "by default". std::array_view still hasn't landed so I can't even wrap ptr+size s provided by third party libraries or across standardized C ABIs without going beyond the standard library (really not by default now)... and I believe I have yet to see a single solitary use of at() in a production C++ codebase. Some "default".

For code where overflows are expected, C++ exceptions are way too heavyweight and some other kind of "attempt dereferencing" pattern is used (if only checking the index manually before invoking operator[]).

For code where overflows are unexpected, uncatchable assert-style checks are used.

I've heard these "idiomatic C++" and "improved data structures" arguments a few times and as a C++ developer, I'm a bit skeptical. Does anybody know of any non-toy C++ projects that actually demonstrate such a high-level of reliable use of C++?

If so, how much do they rely on the developers being ideal programmers, who know intimately the intricacies of C++ and how to avoid ending up in UB-land?

I have a codebase that amounts to several kLOC of C++, available here: https://git.sr.ht/~maelkum/viuavm While I openly admit that parts of it are shoddily written, I also try to always use the "reliable subset" of C++.

From my experience it is not hard to make it a habit (e.g. using `::at()` and declaring variables as `auto const x = ...;` is muscle memory at this point) but the code becomes very verbose and looks like programmed defensively to a sometimes ridiculous extent.

So I understand your skepticism. To address the point of relying on developers being "ideal": just turn the compiler warning flags up to eleven, make all warnings errors, and run your tests two times - one time with sanitisers, and the other under Valgrind. This won't catch all errors, sure, but will still make you more confident in your code's reliability.

I'm pretty sure OP meant to write "idiomatic C++ does not bounds check". That's certainly true - the only place I've ever seen at() in use is in textbooks, never in production code. It doesn't really make much sense, because the moment you start using iterators, it's all unchecked anyway, so why bother with at() specifically?