Hacker News new | ask | show | jobs
by mgaunard 1666 days ago
That is not true.

Anyone who programs low-latency C++ knows that the libstdc++ implementation is great (which is what 99.9% of people use) while others tend to be less stellar.

It's just a segmented vector. The libstdc++ implementation always allocates one segment even if empty, and while I've seen low-latency guidelines arguing for empty things not allocating, my personal guideline is to always allocate reasonable capacity on construction rather than on first insertion.

A ring buffer is a completely different data structure, it only works for fixed-capacity queues.

3 comments

>which is what 99.9% of people use

Maybe we have different ideas about what constitutes "low-latency" but in HFT std::deque is rarely used. Much like std::unordered_map, which allocates every insert potentially costing up to a microsecond for each insert.

>It's just a segmented vector.

https://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-html-USER... we can look at the source. This "segmented vector" is `_Tp** _M_map`, essentially a vector of vectors. This means potentially twice the pointer-chasing, as we have to do two lookups. It also means allocation/deallocation each time a new segment is added/destroyed, which leads to more allocation than when using a vector (first need to allocate the segment, then potentially allocate entries within that).

>A ring buffer is a completely different data structure, it only works for fixed-capacity queues.

Where possible it's better to use a fixed capacity queue, especially one where the capacity is a power of 2 so wrap-around calculation can be done more efficiently. But the same kind of thing can be done for a dynamic-capacity queue, by resizing the whole backing array when capacity is reached.

HFT needn’t mean low latency is required but of course the two tend to be related.

If trivial allocations might cost you a microsecond then you can’t allocate at all on a latency-sensitive critical path and should probably get a better allocator. Also, be wary of power-of-two sized things because if they are aligned they are likely to have poor cache performance (too many things compete for the same set of cache lines).

I've worked in many HFT shops and that's not my experience. A lot of those firms have poor C++ code though, but std::deque was typically used in the better ones.

It's not one allocation per element, as said previously it's a segmented vector. It works similarly to virtual memory: fixed-size segments are allocated for contiguous storage, while another parent array points to the segments. So appending/prepending either needs to append/prepend to the last/first segment or to allocate a new segment if full, which is what makes those operations more efficient than on a vector. Yes there is indirection, that's what segmented means. You also get stability of the memory location which is important for many data structures. The article even tells you how big the segments are in the two implementations being looked at.

Going back to HFT, a similar data structure that's definitely heavily used there is a pool, which has a configurable segment size (potentially variable) and doesn't have order, enabling faster deallocation with a freelist.

I thought a resizable ring buffer was a valid implementation, but maybe I'm thinking of Rust's VecDeque and there's something about the semantics of std::deque that doesn't allow that.
I don't think insert at front/back is allowed an amortized bound, it has to be flat constant. But then I don't see how libc++'s is conforming if this article is true, so I'm not entirely sure what's going on there.
Ohhhh, I think I found the issue with a ring buffer implementation. Would edit my post if I was in the edit window.

resize() doesn't invalidate existing references in std::deque.

How could you make any operation O(1) with resizing?
It is possible to deamortize the O(n) resizing step, meaning you spend O(1) extra time over the O(n) operations leading to the resize, so that when the resize needs to happen, you don't need to spend O(n) time. This assumes that allocation/deallocation takes O(1) time regardless of the buffer size.

Deamortized resizing can work as follows: When inserting into a vector that is half full, allocate a new buffer twice the size and copy two elements into the buffer. For the subsequent insertions, copy two elements every time you insert one element. This guarantees that you are done copying all elements into the new buffer once the old buffer is full, and you only performed a constant number of instructions with each operation.

It's possible to deamortize many quite advanced data structures in theory - see e.g. the lecture notes on "Maintaining Order in a List" on this course webpage: https://cs.au.dk/~gerth/aa11/

... but as with everything in theoretical computer science you are not guaranteed that the implementation works better in practice than the "naive" approach. Deamortization techniques do have a lot of practical relevance in hard realtime applications, but I don't know how much of "real" hard realtime stuff uses academic techniques and how much of it is mostly engineering.

Amortized O(1)
Amortized O(1) does not satisfy the specified requirements.
Most STL data structures take an allocator template argument. If you fight the compiler enough to make your own std::allocator with preallocation then the STL is actually pretty nice.
I think he might be talking about a segmented ring buffer, which is similar to a deque but each segment is a ring buffer. It gives much better insertion complexity.

Anyway I have to echo his point that I've never found a situation where deque performs better than vector, even when you really might expect it to. I guess the extra allocations are too slow. Maybe it would be more useful with arenas or something.

> my personal guideline is to always allocate reasonable capacity on construction rather than on first insertion.

That works until probabilistically there is a decent chance the capacity will always be zero.