| HN Mirror

Yeah, if you instead dumped a dense binary log format into the log and then demangled/decoded on display, then you should be able to get much higher throughput.

A separate decode step instead of straight to the console is not so bad because the log throughput is already high enough that you can not read it live, so switching to a post-mortem/dump with a dedicated decode step offers you a lot of options for efficiency.

If you want to go further, since you would no longer emit it directly into the console, you can relax flushing requirements and batch more data before persisting. You could also go for a dedicated log, make it single-producer/single-consumer, or other ways of optimizing the logging itself.

You could probably get 10x on just basic encoded formatting. The next 10x would likely require logging and storage optimizations. It is likely that your OS and storage devices would become your limiting factor at that point. The last 10x to get to the billion events per second per core level is turbo black magic and requires substantive changes to the compiler, operating system, and runtime to achieve.