|
|
|
|
|
by agwa
524 days ago
|
|
> If you let a disk run full weird shit happens. Only in buggy software that ignores errors from system calls. Obviously you can expect availability problems when you run out of space, but there's no excuse for losing data from committed transactions given that the OS will reliably report the error. > So I do strongly hope that besides changing software, they added some disk space monitoring. One of the action items in their incident report was improving monitoring and alerting: https://groups.google.com/a/chromium.org/g/ct-policy/c/038B7... |
|
For example, the author says "Whilst its primary key index still includes the ~1,300 unsequenced rows, the table itself no longer contains them." This is simply impossible in InnoDB, which uses a clustered index – the primary key B-tree literally is the table with InnoDB's design. The statement in the incident report is complete nonsense.
Also if the storage volume runs out of space, InnoDB rolls back the statement and returns an error. See https://dev.mysql.com/doc/refman/5.7/en/innodb-error-handlin... (MySQL 5.7 doc, but this also aligns with MariaDB's behavior for this situation)
So maybe they were using MyISAM, which is ancient and infamously non-crash-safe and should pretty much never be used for anything of consequence, which has been well-known in the database world for 15+ years. Who knows, they didn't say.
The report directly says "Our teams have very little expertise with MySQL/MariaDB", which also makes me question their conclusion "that it is not possible" to restore/recover.
Also this whole incident happened to Sectigo, take from that what you will.
The one thing I'd criticize MariaDB for here is that they're lacking an additional safety mechanism which MySQL has: by default MySQL will shut down the database server if a write to the binary log cannot be successfully made, ensuring that any writes on the primary can be replicated as well. (iirc this was a contribution from Facebook.)