Hacker News new | ask | show | jobs
by mitchellh 4619 days ago
I'm jumping on a plane right now (a couple hours) but I'd be happy to answer any questions related to Serf once I land. Just leave them here and I'll give it my best shot! We've dreamt of something like Serf for quite awhile and I'm glad it is now a reality.

Some recommended URLs if you're curious what the point is:

"What is Serf?" http://www.serfdom.io/intro/index.html

"Use Cases" http://www.serfdom.io/intro/use-cases.html

Comparison to Other Software: http://www.serfdom.io/intro/vs-other-sw.html

For the CS nerds, the internals/protocols/papers behind Serf: http://www.serfdom.io/docs/internals/gossip.html

Also, I apologize the site isn't very mobile friendly right now. Unfortunately I can write a lamport clock implementation, but CSS is just crazytown.

5 comments

A meta question if you will - I have often come across situations in work where "if only we had that tool". sometimes I have hacked something together, other times taken it further and tidied it up and released it. But this seems to have a large level of polish

so ...

When did you realise the need for surf

Did you work on it as a main project at some point or is it a side project

When and how did you decide to commit to getting this done

and the big one for me - tools are driven by a need, but often the need keeps coming while the time to build it diminishes. What strategies did you use to keep the plates spinning while building surf?

I think we can all mostly answer the questions - I just waant to know how different your answers are from say mine when I don't release two major OSS projects and you do.

cheers

Great questions. I'll answer each in turn.

I want to mention the "polish": I personally don't believe in releasing an open source project without polish. If it is missing docs, its just not complete. If it is ugly, it is not complete. The technical aspects of Serf were done weeks ago. Getting the human side of things done took another few weeks (contracting designers and such).

> When did you realise the need for serf?

The need for something like Serf has existed since I started doing ops. Every time I hit something where I say to myself "why is this so hard/crappy" is when I write it down in my notebook for a future date. I then just think on the idea for awhile and eventually when I feel like I have a significantly better solution than what is out there already, I build it.

I decided to start building Serf when @armon started throwing gossip protocol academic papers at me. I realized he figured it out, this was clearly significantly better, so we started working on it.

> Did you work on it as a main project at some point or is it a side project?

To get it out the door we focus on it for some period of time. After it is shipped it is still what I would consider a "main project" but time is split between various projects.

> When and how did you decide to commit to getting this done?

A few weeks ago. It took about a month to build. Building it is easy. Figuring out WHAT to build... took a long time. I have to say I've had "service orchestration/membership" in my notebook for years.

> What strategies did you use to keep the plates spinning while building surf?

No good answer here, we just prioritize some things over others. Serf was our top priority this month.

Thank you - that "building was easy, compared to knowing what to build" put a lot into perspective. And reaching out to external people to build the polish is a surprise, but obvious in retrospect.

I am afraid that for such a helpful and clear answer, you get a mere 1 karma point from me - but thank you.

I'm still looking for a very simple system that would let me do a live redeploy with a blocking database schema migration in between.

For that I (think I)'d need a system that will:

* start off with X nodes live in the load balancer

* trigger redeploys on half (or so) of them

* when half of the nodes have been redeployed, block and perform the database migration and trigger redeploys for the remaining server

* after the migration completed switch over the load balancer to the now updated half of servers

Is this something that you could orchestrate with Serf? Or am I looking in the wrong direction here.

You could build something like this on Serf, but it would take a little creativity. You might do something like this:

1) Send a "pre-deploy" event. Handler scripts use random number generator to decide which group they are in ("flip a coin" basically)

2) Half the nodes should transition to the "left" state, do the deploy and rejoin the cluster.

3) Once this is done, trigger the migration.

4) Flip the LB to the nodes that did the Join/Leave (you can potentially distinguish them using different role names, or by tracking who left and joined)

5) Run the "post-deploy". The other half of the nodes should now deploy

6) Update the LB to include everybody as the nodes leave/join

This is of course a rough sketch, it is certainly possible if tricky to build something like this.

The one thing that feels lacking to me, and maybe there's just something I don't get, is the ability to tag nodes with metadata. From the docs it seems like what you're expected to do is fire an event for a node to, eg., declare itself a webserver, but this seems prone to failure in the long run. If I bring a new load balancer online how does it find out what's a webserver already?
This might be a little unclear, but if you check the documentation for agent configuration (http://www.serfdom.io/docs/agent/options.html), there is an option to provide a role. The role is the metadata support currently
Cool, I must have missed that. It would be nice, then, if you could also have a node be tagged with multiple roles (as well as add/remove them in a way that propagates).
Dynamic roles is a fairly hard problem, which we hope to solve by building a different tool on top of Serf in the near future. If you want multiple roles that are static, it can be simulated by just providing a comma separated value as the role (since that is just an opaque string value to Serf)
You need to do some repair on the website for iOS; the front page is white text on a white background.
Fixed. How does CSS work?
Since the main goal seems to be node discovery, can you compare why I might want to use this in addition to Salt? I've found salt's node discovery and targeting tools fast and powerful.
We have a comparison against Chef and Puppet, which may be relevant to Salt here: http://www.serfdom.io/intro/vs-chef-puppet.html. Not sure about Salt's search, but Serf is designed to run much more often than config management tools, and is able to handle topology changes in seconds instead of minutes or hours. Serf is also designed ground up to be fault tolerant, which is not usually a design goal of config management tools.