| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by logicchains 1666 days ago

>which is what 99.9% of people use

Maybe we have different ideas about what constitutes "low-latency" but in HFT std::deque is rarely used. Much like std::unordered_map, which allocates every insert potentially costing up to a microsecond for each insert.

>It's just a segmented vector.

https://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-html-USER... we can look at the source. This "segmented vector" is `_Tp** _M_map`, essentially a vector of vectors. This means potentially twice the pointer-chasing, as we have to do two lookups. It also means allocation/deallocation each time a new segment is added/destroyed, which leads to more allocation than when using a vector (first need to allocate the segment, then potentially allocate entries within that).

>A ring buffer is a completely different data structure, it only works for fixed-capacity queues.

Where possible it's better to use a fixed capacity queue, especially one where the capacity is a power of 2 so wrap-around calculation can be done more efficiently. But the same kind of thing can be done for a dynamic-capacity queue, by resizing the whole backing array when capacity is reached.

3 comments

dan-robertson 1666 days ago

HFT needn’t mean low latency is required but of course the two tend to be related.

If trivial allocations might cost you a microsecond then you can’t allocate at all on a latency-sensitive critical path and should probably get a better allocator. Also, be wary of power-of-two sized things because if they are aligned they are likely to have poor cache performance (too many things compete for the same set of cache lines).

link

mgaunard 1666 days ago

I've worked in many HFT shops and that's not my experience. A lot of those firms have poor C++ code though, but std::deque was typically used in the better ones.

It's not one allocation per element, as said previously it's a segmented vector. It works similarly to virtual memory: fixed-size segments are allocated for contiguous storage, while another parent array points to the segments. So appending/prepending either needs to append/prepend to the last/first segment or to allocate a new segment if full, which is what makes those operations more efficient than on a vector. Yes there is indirection, that's what segmented means. You also get stability of the memory location which is important for many data structures. The article even tells you how big the segments are in the two implementations being looked at.

Going back to HFT, a similar data structure that's definitely heavily used there is a pool, which has a configurable segment size (potentially variable) and doesn't have order, enabling faster deallocation with a freelist.

link

monocasa 1666 days ago

I thought a resizable ring buffer was a valid implementation, but maybe I'm thinking of Rust's VecDeque and there's something about the semantics of std::deque that doesn't allow that.

link

kadoban 1666 days ago

I don't think insert at front/back is allowed an amortized bound, it has to be flat constant. But then I don't see how libc++'s is conforming if this article is true, so I'm not entirely sure what's going on there.

link

monocasa 1665 days ago

Ohhhh, I think I found the issue with a ring buffer implementation. Would edit my post if I was in the edit window.

resize() doesn't invalidate existing references in std::deque.

link

mgaunard 1666 days ago

How could you make any operation O(1) with resizing?

link

rav 1665 days ago

It is possible to deamortize the O(n) resizing step, meaning you spend O(1) extra time over the O(n) operations leading to the resize, so that when the resize needs to happen, you don't need to spend O(n) time. This assumes that allocation/deallocation takes O(1) time regardless of the buffer size.

Deamortized resizing can work as follows: When inserting into a vector that is half full, allocate a new buffer twice the size and copy two elements into the buffer. For the subsequent insertions, copy two elements every time you insert one element. This guarantees that you are done copying all elements into the new buffer once the old buffer is full, and you only performed a constant number of instructions with each operation.

It's possible to deamortize many quite advanced data structures in theory - see e.g. the lecture notes on "Maintaining Order in a List" on this course webpage: https://cs.au.dk/~gerth/aa11/

... but as with everything in theoretical computer science you are not guaranteed that the implementation works better in practice than the "naive" approach. Deamortization techniques do have a lot of practical relevance in hard realtime applications, but I don't know how much of "real" hard realtime stuff uses academic techniques and how much of it is mostly engineering.

link

monocasa 1665 days ago

Amortized O(1)

link

mgaunard 1665 days ago

Amortized O(1) does not satisfy the specified requirements.

link

monocasa 1665 days ago

Oh, ok, it used to be amortized constant O(1), but changed at some point apparently.

https://web.archive.org/web/20150310134852/https://en.cppref...

link

CoolGuySteve 1665 days ago

Most STL data structures take an allocator template argument. If you fight the compiler enough to make your own std::allocator with preallocation then the STL is actually pretty nice.

link