Hacker News new | ask | show | jobs
by stinos 3 days ago
and even std::map, std::unordered_map, even std::vector (!) suck

It's really hard to take your comment serious because of generalization like this. Maybe they're not usable for your particular usecase but that doesn't mean they suck. Just like there's a 'million' ways that C++ sucks in your book, there's a reason there's millions of lines of code out there where these containers are valid usecases and hence work without issues whatsoever nor a need to replace them with something else.

2 comments

std::map and std::unordered_map are just unbelievably shitty implementations. The former is a red-black tree, which in my entire programming career I have needed to reach for like... twice? It's just not the right container for almost any problem you have, yet it's the one that gets the short, sweet name. The latter is a bucket-based hashmap, which is about the worst kind of hashmap that can be built. On top of that, their APIs are also really annoying to use compared to, say, Python or Rust's implementation. At least C++20 finally added a simple contains method, but something like setdefault is just a chore to get implemented.
Not really apropos any of this but a vivid career memory for me is leaving a startup that was all in C++ and going to Arbor Networks, which was all C, and porting the STLport map to rbtree.c and making it our default container. No, I have no idea why I did that. Different times!
Pretty much the only collections I use are:

1. arrays

2. linked lists

3. hash tables

4. simple binary trees

They're not useable for anything serious, i.e. high throughput, low frequency, massively concurrent work. In other words, most of the things for which you shouldn't better have chosen a different language in the first place.

They're also unusable by the way because of ergonomic and software architecture factors, such as bad modularity, terrible compile times, unreadable error messages, unreadable symbol names...

Yes that is overgeneralizing a little bit but it's largely true.

The problem is typically not the containers themselves but all the other bad decisions that they push you to make in order to work around their "small issues".

The huge problem is that these containers can get you started quickly, i.e. leetcode type stuff and single threaded stuff, but at some point you'll realize your architecture ended up completely in the wrong place because of that.

If you haven't been thinking deeply about memory management and concurrency, you won't be able to understand, no offense meant. I've just fixed another subsystem that was completely overwhelmed, seeing 8x bandwidth gains already on a small testsystem, but the factor is basically unbounded when moving to bigger systems, when it's about contended vs uncontended.

> anything serious, i.e. high throughput, low frequency, massively concurrent work

Why is only 'high throughput, low frequency, massively concurrent work' considered 'serious'?

They're just clearly inferior in pretty much any situation.

The map stuff the other posters summed up well but even std::vector is dogshit with pretty much all implementations having inlined grow code in push_back, a not too great API and missed optimisations e.g. no trivial relocation when growing the vector / moving it and no useful APIs such as "grow but don't initialise"...

To be fair grow-but-don't-initialize is a pretty fundamental part of the API, the reserve() method.

But already the basic premise that you should push back without thinking is wrong. You will suffer reallocations and invalisations when you least expected them, and frankly you have to architect around that fact which is a terrible restriction. You can work around by pre reserving but at that point it's just a basic fixed heap allocated array but worse because the type gives you a weird look all the time, "I'll realloc as soon as you don't pay attention, harhar"!

std::vector lacks what I call the "bifurcated reservation API". It has Rust's Vec::reserve_exact but not Vec::reserve -- these APIs serve subtly different purposes, the former (which C++ calls just "reserve") says "Here's a hint for exactly how big this container will ever grow" while the latter says "Here's a hint for how big this container will get in my immediate future, but it might grow further later".

The implementation always tries to grow (if necessary) to the exact size chosen for Vec::reserve_exact, but for plain Vec::reserve if growth is needed it always grows exponentially, not to the exact size, preserving the O(1) push cost.

For a typical "doubling" growable array type, if we're pushing groups of ten items, reserve_exact or C++ grows like 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 ... which is much worse than O(1) whereas the correct reserve grows 10, 20, 40, 80, 160 preserving O(1)

For trivial types you can work around this in C++ with a little work, and for the non-trivial types you can work around it with a bunch more extra code, but you probably won't.

Bjarne among other people teaching C++ recommend just not using the reservation API as a reservation API because of this problem†, and the resulting teaching definitely leaks into CS graduates and even into languages which have the correct API and so you have to un-teach the bad lesson.

In applications where you actually can't afford to pay for growth (or at least in some cases can't afford this) I also like Vec::push_within_capacity which I believe comes from Rust for Linux where the kernel legitimately needs this "If there's room push, otherwise I have a plan B" approach.

† To Bjarne this API is instead conceived of as a way to preserve reference validity. Since it won't grow, our references will still work.

Vec::reserve() is the behaviour you get from C++ std::vector push_back() (implicit just-in-time reserve), so right now I can't see a situation where I'd want the explicit Rust version even if I didn't think the whole push realloc thing is mostly a bad idea.

Yes, Rust version allows you to maybe skip a reallocation step or two by doing explicit up front reallocation. But remember most allocation work is always from the last grow anyway. The Rust version seems like a microoptimization, giving a little bit more explicit control in a situation where you've already pretty much given up control and gone like, throw hands in the air, we're doing push_back()!

Feels like typical improvement/perfection tunnel vision syndrome to me, though. That syndrome is engrained in C++ community and I think also Rust community, although it looks like the latter took the chance to do many things better from the start learning from C++'s mistakes.

Realloc is pretty much never the right way in whatever form, and I've never seen any need to include realloc in any of my own allocators (mostly blockallocators and linear allocators and pools/free lists, sometimes using malloc/free).

You are free to make your own definition, what are your suggestions?

(Obviously I meant to say low latency, not low frequency)

That is a cop-out. You made the assertion, you define it.

What about these particular workloads (and the environments they're used in) make them 'serious' and why are other workloads 'lesser' and therefore the standard library 'suffices'? Why not use better containers for everything? Google, for instance, universally recommends Abseil.

I stand by my statement. You are welcome to contend it, but then please actually suggest alternative workloads that could qualify as serious?

Google also recommends using Golang in many cases, which was explicitly designed for not-so-experienced people. It is more geared towards creating services quickly and robustly, not towards squeezing out the last 10x of performance.

I can't say anything about abseil, having never used it. "By Google" is not an immediate seal of quality, and it doesn't mean that one size fits all. I've recently seen a list of .so objects required in order link grpc (also by Google), there were like dozens of abseil .so's in the list IIRC. I don't like it! In my experience, validated countless times in my own practice, complexity => slow and hard to maintain and simplicity => fast and easy to change.

Templatized C++ containers are generic recipes, they can't make use of local context and hence don't lead to the simplest solutions. They produce tons of boilerplate. They give you a decent single-threaded baseline performance in many cases, sure, but now start hammering these data structures using 16 CPUs in parallel... you might find that you should have designed your application completely differently.

Like WalterBright says, I too use very simple data structures almost exclusively (arrays, linked lists, linear allocation). Most of these coded ad-hoc everywhere, very straightforward, very adaptable. Even hash maps I only use infrequently -- if I can, I just make keys such that I can index directly. I get performance from knowing exactly what happens, and minimize the work that needs to happen. I create immutable data objects (write-once) where possible. I've created my own pooled block-allocator for power of 2 sized allocations with book-keeping in shadow-memory (preventing fragmentation), massively reducing syscalls that modify virtual memory mappings and closely controlling memory used by various subsystems.

I don't expect a library like abseil to solve many of my problems, it probably makes some things "easy" by taking away the very control from me that I need to solve the problem I have, resulting in unsolved problems and complexity.

Because if you don't need any of these, any slop implementation will do.
If you haven't been thinking deeply about memory management and concurrency, you won't be able to understand, no offense meant

I do think about that, when needed. My point is that these containers can be 'good enough' in places where it doesn't really matter, not that they're always the go-to thing. E.g. I really don't see any issue using a map as part of a configuration type of object which gets read from args or json and which only gets used once at application start.

Yes. When you don't have any specific requirements, any mediocre data structure will do.