|
|
|
|
|
by simula67
1020 days ago
|
|
> While manual backup and restore tests were run once a month to ensure our backups were functioning, they were run manually. After digging into why our restores were not coming up with data, I found that our recurring backups were missing the flag to run volume backups with Restic which snapshots PVC block volume data. Can someone explain this? How did they test restores, if the actual restore failed to come up with data? |
|
They were running for years on the cusp of total failure and had automated restoration tests that caused a false sense of security in the tooling.
The second thing I did was adjust the restoration tooling to validate data existence and over time added validation tests (percent of data matched current live systems, specific fields and values were there, etc).
It's just too easy to screw up, doubly so when time constrained and alone doing the best you can without any oversight.