When I first heard the dev of CRIU interviewed on FLOSS Weekly it blew my mind.
From "Torvalds said I will not merge this crazy talk" into being able to halt active TCP connections and then horse whisper them to another container (since this is popular with OpenVZ) made me drop jaw while driving the car. I realized then I will never understand a Unix system enough to be taken seriously, and this guy is hot migrating them, long before this container stuff was cool!
Quality as in no bugs is good but that doesn't mean it's quality _code_. I've seen bloated and horrible code pass static analysis as well as less bloated but just as horrible pass cyclomatic complexity tests. Sadly I don't think there is an automated way to gauge quality from the perspective of a programmer.
And the reason for that is quality is a relative metric which is almost impossible to quantify.
* Static analysis looking for bugs per thousand lines of code
* Idiomatic code adopting language specific preferences
* Code that is easy to maintain but is slow as molasses
* Code that works well enough and was written super fast
* Code that is microoptimized to the extreme, is super fast, but difficult to maintain due to inline ASM
For certain cases, each of these could be considered quality or atrocity. It is all about context.
CRIU is definitely neat, but can someone tell me what the use case is for it? Perhaps more specifically what is the use case in the era of microservices and designing for failure?
The main scenario is live migration of containers. We have live migration feature in OpenVZ since 2005. But it was always in-kernel implementation of checkpointing Linux processes (https://openvz.org/Checkpointing_internals). Linux kernel developers won't accept our patches to vanilla and we decided to implement C/R in userspace.
Others scenarios are here https://criu.org/Usage_scenarios
Some people are trying to use it to do checkpoint restores and hot migrations of containers. For example, if you have a slow starting app in a container, people are trying to use CRIU to create a base container checkpoint, in which the slow app is already started, to generate worker containers quickly.
Running long running jobs on EC2 spot instances. It's not uncommon for a bioinformatics pipeline to have a multi-day runtime. If you can checkpoint every hour or so you can use spot-instances and save ~80% of your compute costs.
From "Torvalds said I will not merge this crazy talk" into being able to halt active TCP connections and then horse whisper them to another container (since this is popular with OpenVZ) made me drop jaw while driving the car. I realized then I will never understand a Unix system enough to be taken seriously, and this guy is hot migrating them, long before this container stuff was cool!