| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ncmncm 2507 days ago

Another hint.

Speaking of breadcrumbs, if the ring buffer has fixed-size entries, a reader can come in later and start reading old entries first, say halfway back. This is helpful if you want to start a new reader and then kill an old one, and not skip any entries.

It helps if the ring has power-of-two size, and the head pointer/index is 64 bits and increases monotonically. Then the high bits are easily masked off on each use, so that arithmetic on pairs of positions is simpler.

For variable-sized entries, an array of N "breadcrumbs", past positions near 1/Nth indices, allows jumping in at earlier positions. If traffic is low enough, you might be able to buffer a whole day's traffic, and get random access starting from breadcrumbs; otherwise, you can log old entries to a sequential file, and also log the breadcrumbs, translated to file offsets, as a global index.

Downstream processes can each sequentially log an individual field of each record, with a breadcrumb index to enable full records to be reconstructed. Often these column logs can be compressed, with enormous efficiency, between breadcrumbs: 98% compression may be easy to achieve for slowly-changing or limited-alphabet values.

Lz4 and Zstd are excellent compression engines. Lz4 really shines for fast decompression. There is no excuse for zlib/gz compression anymore.

1 comments

rramadass 2504 days ago

Thanks for the hints. For one product that i had worked on, we had something like the above for runtime logging/debugging. A shared memory area (i.e. address range in Linux process space where shared libraries are loaded) at a fixed address was reserved via linker scripts with each process having its ring buffer at its own fixed offset. A separate reader process interacted with the CLI to provide comprehensive access to this data. It was all robust and worked quite well.

Unfortunately, these sorts of practical techniques are not known to many programmers and it would be nice if somebody (eg. you :-) were to list it on a website/book with some sample code for everybody's benefit.

link