| Another hint. Speaking of breadcrumbs, if the ring buffer has fixed-size entries, a reader can come in later and start reading old entries first, say halfway back. This is helpful if you want to start a new reader and then kill an old one, and not skip any entries. It helps if the ring has power-of-two size, and the head pointer/index is 64 bits and increases monotonically. Then the high bits are easily masked off on each use, so that arithmetic on pairs of positions is simpler. For variable-sized entries, an array of N "breadcrumbs", past positions near 1/Nth indices, allows jumping in at earlier positions. If traffic is low enough, you might be able to buffer a whole day's traffic, and get random access starting from breadcrumbs; otherwise, you can log old entries to a sequential file, and also log the breadcrumbs, translated to file offsets, as a global index. Downstream processes can each sequentially log an individual field of each record, with a breadcrumb index to enable full records to be reconstructed. Often these column logs can be compressed, with enormous efficiency, between breadcrumbs: 98% compression may be easy to achieve for slowly-changing or limited-alphabet values. Lz4 and Zstd are excellent compression engines. Lz4 really shines for fast decompression. There is no excuse for zlib/gz compression anymore. |
Unfortunately, these sorts of practical techniques are not known to many programmers and it would be nice if somebody (eg. you :-) were to list it on a website/book with some sample code for everybody's benefit.