|
|
|
|
|
by jart
695 days ago
|
|
Cosmopolitan Libc --ftrace on my workstation logs 1 million lines per second for a program written in C like `python -c "print('hello')" --ftrace`. If I do clang-19 --ftrace which is written in C++ then it does 476 thousand lines per second. That goes half as fast because the Cosmo Libc privileged runtime has to demangle C++ symbols before printing them in the log. Note I'm using `export KPRINTF_LOG=log`. It's hard to believe it goes so fast considering it's a single thread doing this and kisdangerous() has to lock the memory manager and search a red-black tree every time a pointer is dereferenced (e.g. unwinding stack frames to print space indentation) by kprintf() or ftracer(). If the Linux kernel write() syscall didn't have so much overhead it'd probably go faster. |
|
A separate decode step instead of straight to the console is not so bad because the log throughput is already high enough that you can not read it live, so switching to a post-mortem/dump with a dedicated decode step offers you a lot of options for efficiency.
If you want to go further, since you would no longer emit it directly into the console, you can relax flushing requirements and batch more data before persisting. You could also go for a dedicated log, make it single-producer/single-consumer, or other ways of optimizing the logging itself.
You could probably get 10x on just basic encoded formatting. The next 10x would likely require logging and storage optimizations. It is likely that your OS and storage devices would become your limiting factor at that point. The last 10x to get to the billion events per second per core level is turbo black magic and requires substantive changes to the compiler, operating system, and runtime to achieve.