CRIU is definitely neat, but can someone tell me what the use case is for it? Perhaps more specifically what is the use case in the era of microservices and designing for failure?
The main scenario is live migration of containers. We have live migration feature in OpenVZ since 2005. But it was always in-kernel implementation of checkpointing Linux processes (https://openvz.org/Checkpointing_internals). Linux kernel developers won't accept our patches to vanilla and we decided to implement C/R in userspace.
Others scenarios are here https://criu.org/Usage_scenarios
Some people are trying to use it to do checkpoint restores and hot migrations of containers. For example, if you have a slow starting app in a container, people are trying to use CRIU to create a base container checkpoint, in which the slow app is already started, to generate worker containers quickly.
Running long running jobs on EC2 spot instances. It's not uncommon for a bioinformatics pipeline to have a multi-day runtime. If you can checkpoint every hour or so you can use spot-instances and save ~80% of your compute costs.