Hacker News new | ask | show | jobs
by logicchains 1666 days ago
>which is what 99.9% of people use

Maybe we have different ideas about what constitutes "low-latency" but in HFT std::deque is rarely used. Much like std::unordered_map, which allocates every insert potentially costing up to a microsecond for each insert.

>It's just a segmented vector.

https://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-html-USER... we can look at the source. This "segmented vector" is `_Tp** _M_map`, essentially a vector of vectors. This means potentially twice the pointer-chasing, as we have to do two lookups. It also means allocation/deallocation each time a new segment is added/destroyed, which leads to more allocation than when using a vector (first need to allocate the segment, then potentially allocate entries within that).

>A ring buffer is a completely different data structure, it only works for fixed-capacity queues.

Where possible it's better to use a fixed capacity queue, especially one where the capacity is a power of 2 so wrap-around calculation can be done more efficiently. But the same kind of thing can be done for a dynamic-capacity queue, by resizing the whole backing array when capacity is reached.

3 comments

HFT needn’t mean low latency is required but of course the two tend to be related.

If trivial allocations might cost you a microsecond then you can’t allocate at all on a latency-sensitive critical path and should probably get a better allocator. Also, be wary of power-of-two sized things because if they are aligned they are likely to have poor cache performance (too many things compete for the same set of cache lines).

I've worked in many HFT shops and that's not my experience. A lot of those firms have poor C++ code though, but std::deque was typically used in the better ones.

It's not one allocation per element, as said previously it's a segmented vector. It works similarly to virtual memory: fixed-size segments are allocated for contiguous storage, while another parent array points to the segments. So appending/prepending either needs to append/prepend to the last/first segment or to allocate a new segment if full, which is what makes those operations more efficient than on a vector. Yes there is indirection, that's what segmented means. You also get stability of the memory location which is important for many data structures. The article even tells you how big the segments are in the two implementations being looked at.

Going back to HFT, a similar data structure that's definitely heavily used there is a pool, which has a configurable segment size (potentially variable) and doesn't have order, enabling faster deallocation with a freelist.

I thought a resizable ring buffer was a valid implementation, but maybe I'm thinking of Rust's VecDeque and there's something about the semantics of std::deque that doesn't allow that.
I don't think insert at front/back is allowed an amortized bound, it has to be flat constant. But then I don't see how libc++'s is conforming if this article is true, so I'm not entirely sure what's going on there.
Ohhhh, I think I found the issue with a ring buffer implementation. Would edit my post if I was in the edit window.

resize() doesn't invalidate existing references in std::deque.

How could you make any operation O(1) with resizing?
It is possible to deamortize the O(n) resizing step, meaning you spend O(1) extra time over the O(n) operations leading to the resize, so that when the resize needs to happen, you don't need to spend O(n) time. This assumes that allocation/deallocation takes O(1) time regardless of the buffer size.

Deamortized resizing can work as follows: When inserting into a vector that is half full, allocate a new buffer twice the size and copy two elements into the buffer. For the subsequent insertions, copy two elements every time you insert one element. This guarantees that you are done copying all elements into the new buffer once the old buffer is full, and you only performed a constant number of instructions with each operation.

It's possible to deamortize many quite advanced data structures in theory - see e.g. the lecture notes on "Maintaining Order in a List" on this course webpage: https://cs.au.dk/~gerth/aa11/

... but as with everything in theoretical computer science you are not guaranteed that the implementation works better in practice than the "naive" approach. Deamortization techniques do have a lot of practical relevance in hard realtime applications, but I don't know how much of "real" hard realtime stuff uses academic techniques and how much of it is mostly engineering.

Amortized O(1)
Amortized O(1) does not satisfy the specified requirements.
Oh, ok, it used to be amortized constant O(1), but changed at some point apparently.

https://web.archive.org/web/20150310134852/https://en.cppref...

Most STL data structures take an allocator template argument. If you fight the compiler enough to make your own std::allocator with preallocation then the STL is actually pretty nice.