|
|
|
|
|
by willvarfar
2107 days ago
|
|
I’ve run into serious house burning down problems with myrocks too. Simple recipe to crash MySQL in a way that is unrecoverable: do ALTER TABLE on a big table and it runs out of RAM, crashes, and refuses to restart, ever. Googling and people have been reporting the error on restarting several times on lists and things. What help is it to report to Maria dB or something? But do FB notice? Seems not. Here’s hoping someone at FB browses HN... I don’t get why FB don’t have some fuzzing and chaos monkey stress test to find easy stability bugs :( |
|
1) Schema Changes by DDL (e.g. ALTER TABLE, CREATE INDEX)
2) Recovering primary instances without failover
We use our own open source tool OnlineSchemaChange to do schema changes (details: https://github.com/facebook/mysql-5.6/wiki/Schema-Changes), which is heavily optimized for MyRocks use cases like utilizing bulk loading for both primary and secondary keys. ALTER TABLE / CREATE INDEX support in MyRocks is limited and suboptimal -- it does not support Online/Instant DDL (so blocking writes to the same table during ALTER), and enters non bulk loading path and trying to load the entire table in one transaction -- which may hit row lock count limit or out of memory. We have plans to improve regular DDL paths in MyRocks in MySQL 8.0, including supporting atomic, online and instant schema changes.
I am also realizing that a lot of external MySQL users still don't have auto failover and try to recover primary instances if they go down. This means single instance availability and recoverability is much more important for them. We set rocksdb_wal_recovery_mode=1 (kAbsoluteConsistency) by default in MyRocks, which actually degraded recoverability (higher chances to refuse to start even if it can be recovered from binlog). We're changing defaults to 2 (kPointInTimeRecovery) so that it can be more robust without relying on replicas for recovery.
It would have been a really bad experience when hitting OOM by 1) then failing to restart because of 2). We have relations with MariaDB and Percona, and will make default behavior better for users.