|
|
|
|
|
by yencabulator
726 days ago
|
|
> Since everyone is treating containers as cattle CRIU doesn't seem to get much attention Nah, it's more like "I don't trust that thing to not cause weird behavior in production". VM-level snapshots are standard practice[1] because the abstraction there is right-sized for being able to do that reliably. CRIU isn't, because it's trying to solve a much harder problem. [1]: And even there, beware cloning running memory state, you can get weird interactions from two identical parties trying to talk to the same 3rd service, separated by time. Cloning disk snapshots is much safer, and even there you can screw up because of duplicate machine IDs, crypto keys, nonces, etc. |
|
Im sure there are some niche applications for container checkpointing, but I don’t really see the complexity being worth it. Maybe checkpointing some long running batch jobs could save you some money, but you should just make your jobs checkpoint their state to an external store such a ceph or s3 and make the jobs smart enough to load any state from those stores if they are preempted.