| HN Mirror

Another hint.

Speaking of breadcrumbs, if the ring buffer has fixed-size entries, a reader can come in later and start reading old entries first, say halfway back. This is helpful if you want to start a new reader and then kill an old one, and not skip any entries.

It helps if the ring has power-of-two size, and the head pointer/index is 64 bits and increases monotonically. Then the high bits are easily masked off on each use, so that arithmetic on pairs of positions is simpler.

For variable-sized entries, an array of N "breadcrumbs", past positions near 1/Nth indices, allows jumping in at earlier positions. If traffic is low enough, you might be able to buffer a whole day's traffic, and get random access starting from breadcrumbs; otherwise, you can log old entries to a sequential file, and also log the breadcrumbs, translated to file offsets, as a global index.

Downstream processes can each sequentially log an individual field of each record, with a breadcrumb index to enable full records to be reconstructed. Often these column logs can be compressed, with enormous efficiency, between breadcrumbs: 98% compression may be easy to achieve for slowly-changing or limited-alphabet values.

Lz4 and Zstd are excellent compression engines. Lz4 really shines for fast decompression. There is no excuse for zlib/gz compression anymore.