Hacker News new | ask | show | jobs
by TillE 791 days ago
Storing a pointer to memory that you did not explicitly allocate is always a red flag, I think. You really need to understand how everything works, and be very careful.

I would default to just using std::unique_ptr<Node> in a situation like this, especially since using std::list suggests performance isn't critical here, so the additional indirection probably doesn't matter.

2 comments

unique_ptr seems inappropriate here since the pointers aren't unique. shared_ptr doesn't even work because it looks like this data structure is representing a graph and would expect to have cycles. Perhaps you could use some sort of weak pointer that gets nulled out when the target object is destroyed, but that would not have fixed the bug here, just replaced segfaults with some more controlled exception or panic.

In fact, full-on garbage collection wouldn't have prevented the bug here, and could arguably have made it worse. The problem is that nodes were unexpectedly being copied and then the originals deleted. With GC, you'd still have the copy, but never delete the originals, so you end up with split brain where there are multiple copies of each node and various pointers point to the wrong ones. That'd be pretty painful to debug!

IMO the language-level problem, if there is one, is that C++ is too willing to copy in cases where you would expect it to move instead. This is, of course, for backwards compatibility with the before-times when there was no move and copying was the only logical thing to do. But I think life would be better these days if all non-trivial copies had to be requested explicitly.

> Perhaps you could use some sort of weak pointer that gets nulled out when the target object is destroyed

std::shared_ptr comes with std::weak_ptr. Referencing counting is rather ham-fisted approach but is certainly a solution.

> IMO the language-level problem, if there is one, is that C++ is too willing to copy in cases where you would expect it to move instead.

IMO that's not a problem in the language but a problem with the engineer (misunderstanding when std::move is necessary) and the tooling (linter/static analyzer not clearly identifying that something should be moved instead, and raising a linter warning for it).

For that matter, the places where I see std::list used aren't places where "performance isn't important" but rather places where an inexperienced engineer was put in charge of implementation and a senior engineer accepted it. I can't remember the last time I accepted someone using std::list in a code review because there has always been a better design available even if it necessitated some teaching. If a stable pointer address is needed then indeed a smart pointer is the correct solution (perhaps std::vector<std::unique_ptr>). There are other reasons I've had coworkers cite for using std::list (eg constant allocation time) but that's generally resolved with std::vector.reserve(upper bound to size) or eg a slab allocator (unfortunately, I'm not aware of a standard-provided slab allocator, though to be fair I'm not very familiar with C++ standard allocators in general).

> I think life would be better these days if all non-trivial copies had to be requested explicitly

While I don't agree superficially (smells like bringing along deep-copy problems), I think the idea merits some thought experiments.

It would be fairly trivial to do that for non-plain-old-data types by deleting the copy constructor/operator (so it cannot happen implicitly) and providing a `make_copy(...)` function instead.

Agreed that std::list is never the best solution. In every case where I want linked list semantics, it's because I want to be able to dynamically add and remove objects from the list without any allocations at all. The only way to achieve that is an intrusive linked list design, which std::list is not...
> I want to be able to dynamically add and remove objects from the list without any allocations at all. The only way to achieve that is an intrusive linked list design

No it isn't.

    std::array<std::optional<std::pair<T, std::pair<std::size_t /* prev index */, std::size_t /* next index */>>>, 256U /* or whatever your maximum size is \*/>.
Indexing does come with a wart: you'd need a sentry value for a "no prev" or "no next" value, I'd just use std::numeric_limits<size_t>::max() for that. And of course when you move objects within the container you'd need to update the indices.

If you don't know your upper size bound at compile time, then replace `array` with `vector`, and reserve your runtime-known upper bound. As long as you never violate your upper bound then no (re-)allocations occur (unless T's constructor allocates).

In most (quite possibly all) cases I don't know the upper bound even at runtime. So, this ends up requiring allocation.
You don't think it's useful to calculate some reasonable upper bound if only to avoid resource exhaustion?
The good news is that since C++ containers aren't special to the language, you can just implement your own wrapper classes that disable the copy ctor (and provide an explicit `.clone()` instead). Coupled with `#pragma GCC poison` it is pretty easy to blacklist legacy footguns in source files at least (though not in headers without some aggressive work).

... and yet, almost all vulnerabilities in C++ code are still written in C style, not even legacy C++.

Yeah I pretty much only use my own alternate container implementations (from KJ[0]), which avoid these footguns, but the result is everyone complains our project is written in Kenton-Language rather than C++ and there's no Stack Overflow for it and we can't hire engineers who know how to write it... oops.

[0] https://github.com/capnproto/capnproto/blob/v2/kjdoc/tour.md

Every work place I've worked at has their own container classes for performance reasons.

We did use some C++ standard library but very sparingly. For example, std::unique_ptr<> was fine, but std::shared_ptr<> was not because of the way it implicitly stores it's reference, so we had our own implementation that exposed the reference.

When I was doing console games this was a must. You would often have restrictions where the container itself should live in a different area of memory to the items which is much easier to manage with your own classes.

The problem with that is that it is not providing the standard library API, but rather its own API. A good alternative should only remove footguns (ideally, to the point that you can replace with typedefs and get correct behavior just with less protection in case of future changes).

Aside, it's embarrassing that all these alternative libraries fail to implement automatic correct signed/unsigned mixing. It's not like it's particularly hard to implement!

Many std library footguns are inherent to the API...
I mean the article explains the original author did take explicit steps to keep the Nodes fixed in memory (std::list), because they knew it could be a problem if the Nodes moved.

They just got tripped up by an obscure language feature that made the Nodes move anyway.