Hacker News new | ask | show | jobs
by PabloOsinaga 5394 days ago
I wonder - has anyone ever tried the approach of the entire VM binary image being the repo and doing smart bindiffs as changes? The trick would be to identify what is "code/config/program code" vs random/temp files.

In code tracking you do it manually by saying don't track. Here you'd need a bit more intelligence, but seems like doable.

And you could create then like a git for machines.

I may be too simplistic / not fully understand the stackrocket/blueprint/vagrant approach - but why not doing something radically simpler???

2 comments

You are right, having a binary diffing mechanism would make it simpler to implement, but that diffing mechanism would be hard to implement!

We've thought about going pure diff at the beginning and decided that to implement that method will give us less flexibility in the short term.

I speculate that do so, we would need to implement our own virtual machine (we are using virtualbox) and/or disk image format.

mmm I take your point - what ideas are out there to do that thought? I agree it is hard ...

Good news is that for the "known/commont stuff" you can always have a central database of the random/temp stuff they generate.

E.g., mysql generetes tmp files here and there, and so and so - so you could profile all that common stuff in that way.

Then the uncommon stuff or your custom things, you declaratively say "do not track" as you do with code today.

Moreover you can do it in a way that is crowdsourced - ie: if its a cloud service when people declare such and such in mongodb is random/temp then you learn for all users.

At the end of the day there is a limited number of things people use and for the long tail it is OK for people to be declarative I guess.

But on the flip side I can see how that could end up being a nightmare.

But wouldn't it be nice to do something as simple as pull/commit/push for general purpose computers?

that piece is the less difficult i think - you could always use existing VM infrastructure and do (1) restore (2) apply changes and (3) save back - I wouldn't mess with disk formats and so on - no need
Thanks for your thoughts, PabloOsinaga.

I am just spitting out ideas without deep thought but a discussion might prove fruitful.

It could come down to certain types of data, some to be ignored, some to be careful of (like data or code), others to blindly overwrite (critical security updates).

The point would be treat the data as "dumb", but to keep in mind that some data are "dumber" than others.

Perhaps if we had our own disk image format, we could mark certain types of data to be ignored for instance.

Our VM would know what to mark as ignored for say POSIX systems and that aspect could be configurable.

That's exactly what Cloudlets does. http://github.com/dotcloud/cloudlets
That's nice. seems to be the infrastructure required for building such system. question is how do you come about building the manifests? Users need to do that???
I think there needs to be a higher level abstraction.

Blueprint is a nice step in that direction: make a change and dump out the environment.

Perhaps what we need is a system that compiles to that manifest, or to chef/puppet.

Setting up systems is a complex task that neither puppet nor chef solve, they make it less daunting, but it still is a real pain to do so!

or is your perspective that something as simple as this https://github.com/dotcloud/cloudlets/blob/master/sample.clo...

would work?

Wound't the system become a nightmare once I start installing/changing stuff in the machine in a very organic way?

Meaning, I'd have to go and maintain that manifest???

In practice, most of the work happens in the base image. Future versions simply extend the manifest with a few extra lines.

Ideally you want to "compile down" to cloudlets. With smart change management tools you can make it really simple for the user. That sounds exactly like Stackrocket's cup of tea :)

OK - I see. This is a PaaS play in a way, but one that offers a bit more of flexibilty/componentization than the "traditional" PaaS's

I was thinking more along the lines of something truly generic.

Didn't see your comment. Yes indeed, that's exactly what we're working towards =)