| HN Mirror

The easy speedup is to use 2 mutexes, one that protects head and tail_cached, and the other that protects tail and head_cached, and align so they don't interfere. In other words, take the RingBufferV5 from the article and define the class like this:

  std::array<T, N> buffer_;
  alignas(64) absl::Mutex hmu_;
  std::size_t head_{0};
  std::size_t tail_cached_{0};

  alignas(64) absl::Mutex tmu_;
  std::size_t tail_{0};
  std::size_t head_cached_{0};

Then change the code to forget the atomics and just use the locks. On my system this is more than ten times faster than the baseline naïve thread-safe RingBufferV2. That's what I mean about using a bogus baseline.