Hacker News new | ask | show | jobs
by db48x 1461 days ago
Weird; the systemd journal is the feature I want most! It would be the last thing I would ever consider disabling.
1 comments

I don't know if this is still an issue but the last time I used journald the logs would occasionally become corrupted and journalctl would refuse to read them. The fix was to just delete the logs. I have no idea how logging got so screwed up that corruption in part of the file could make the rest of the log file unreadable. I mean, it's a journal, it's right in the name.

Ever since then I switched to rsyslogd and the like. Rock solid.

Keep in mind that rsyslog doesn't even attempt to verify logs. An alternative explanation is: my system is corrupting logs, I changed to a logging daemon which doesn't tell me about it.

I mean, there could definitely be a bug in journald, but I haven't seen any fixes mentioned in changelog for the last 5 years and if it was happening in standard usage, people would notice.

For recovering corrupted logs - you can still "less" them as usual. They have some extra markers, but the text is available as text. Journalctl has some special options for that too.

IIRC when I experienced this the logs were all in some binary format that couldn't easily be less'd. tbh I didn't do much investigation other than to see the "delete all your logs" resolution suggestion. There could have been a better option.
The fact that it can't seem to recover from a few bad records and gives up on the whole file demonstrates what terrible software it is.
Have you tried it? Journalctl does skip bad entries and prints out the rest automatically. If you've found a case where it doesn't, you should report that as a bug.
It may do that nowadays but it definitely didn't back when I experienced this issue. This matches my experience:

https://www.reddit.com/r/linux/comments/1y6q0l/systemds_bina...

Yes, 8 years ago things had more bugs than today / were less mature. It's silly to call something terrible software today because of that. Everyone can find a pet bug they run into years ago. shrug
Lots of hardware problems on display, especially suspend and resume which is notoriously buggy (broken ACPI tables that happen to work in Windows so the hardware manufacture never noticed they were busted, etc). I recommend spending extra to get ECC ram, and running ZFS filesystems. Both can catch a number of types of errors before they corrupt your data. With those precautions I haven’t lost any data in many years.

Though one time at work we had a few thousand hard drives from a particular vendor that had an interesting firmware bug. Very, very occasionally they would write a sector with incorrect data. No individual drive did it very often, but after a few thousand full drive writes we noticed it half a dozen times. We also discovered that the garbage data was always the same across all of the drives. Crazy. Sadly we weren’t running ZFS on those systems, which would have caught the problem and corrected it from redundancy. Thankfully we were able to get a refund. Never put your trust a hard drive.

To get back on topic, I’ve always assumed that journald was reasonably robust against minor corruption, but honestly I’ve never had a reason to test it. At the end of the day no one component of the system is solely responsible for data integrity; every level of the hardware and software must cooperate to prevent corruption else there will be cracks for the data to slip through.

Honestly, that sounds more like a disk problem than a systemd problem.