Hacker News new | ask | show | jobs
by ilyt 1285 days ago
I'd recommend running strace on that journalctl command, you might be surprised what happens. It's even worse if you actually store journalctl logs on disk (most distros defer to storing it in ram). I have it set to store on disk and keep at most 1GB of logs

On my NAS

    # strace journalctl -u nginx 2>&1 |grep open |wc -l
    827
and curiously

    # strace journalctl -u baked-potato 2>&1 |grep open |wc -l
    827
(no i do not have baked-potato service on that machine)

It's DB implementation is utter garbage. It doesn't even organize files by time or really anything useful. If the binary database was a SQLite file with columns containing app name/log level/etc. it would actually be useful but current one is just shit.

Bonus round:

    #:/var/log/journal echo 3 >/proc/sys/vm/drop_caches
    #:/var/log/journal time journalctl -u nginx >/dev/null 2>&1

    real 0m3,232s
    user 0m0,036s
    sys 0m0,292s
    #:/var/log/journal echo 3 >/proc/sys/vm/drop_caches
    #:/var/log/journal time ag -R nginx . >/dev/null 2>&1

    real 0m2,531s
    user 0m0,004s
    sys 0m0,290s
slower than actually searching thru files directly...
1 comments

Why would it care about stracing? It was never too slow for me. A "bad" performance might be caused by ex. consistency checking. Benchmarks like these don't tell much - you can't reduce software performance to one number.

> It's DB implementation is utter garbage. It doesn't even organize files by time or really anything useful.

That sounds interesting, can you please elaborate on the internal structure of journald files or link to further documentation? And why would I care if journald handles it for me?

> Why would it care about stracing? It was never too slow for me. A "bad" performance might be caused by ex. consistency checking. Benchmarks like these don't tell much - you can't reduce software performance to one number.

Well, I was routinely getting multi-second wait for operations like systemctl status servicename which is not something rare, because it didn't even kept the pointer to file that has last few lines of app's logs. It's not just benchmarks

It also trashes cache with hundreds of MBs of logs instead of stuff apps running on server actually needs. An equivalent of "tail -f" can go thru hundreds of megabytes of binary logs.

I don't have a negative experience with journald file format. That said, the format is documented here:

https://systemd.io/JOURNAL_FILE_FORMAT/

On the other hand I agree that journald seems to call open syscall more often than I would originally expect, which can be a problem for some edge cases[1], but I don't consider this to be a real problem.

[1] https://blog.marbu.eu/posts/2022-11-20-ebpf-journald/

The problem is:

If cache is hot, you have hundreds of MBs of systemd logs littering your buffer cache, while it could be used for something more useful

If cache is cold simple operations like systemctl status take up to tens of seconds, especially on loaded servers. Example:

    # time systemctl status influxdb >/dev/null

    real 0m11.913s
    user 0m0.013s
    sys 0m0.531s
It doesn't even keep index of "last file where app wrote logs" which causes above.
But this buffer cache usage is caused by readers/clients checking logs for some reason, not logging itself, right? I see no significant difference compared to syslog here: if I grep all logs for something, it will also place these logs into buffer cache.

Your example with dropping cache and running systemctl status is indeed interesting, 11s looks like too much. But it's a number without a context, and I wonder how big a problem this actually is. I haven't noticed it myself before.

> It doesn't even keep index of "last file where app wrote logs" which causes above.

While I definitely see some room for optimization, I'm not sure what you mean here: journald uses one active journald database file, there should not be a problem with figuring out where the file is, should it?