Hacker News new | ask | show | jobs
by noahnoahnoah 5126 days ago
(I work for 37signals and wrote batsd)

This is really just one piece in a bigger set of things to track performance, usage, etc.

You can think of it as: Emitters --> Statsd (or in this case, Batsd) --> dashboards, alerts, etc.

We have emitters coming from Nginx, HAproxy, bluepill, postfix, etc. log files, a gem within all of our Rails apps, and a variety of other scripts that gather data. Those all point to batsd, which aggregates and stores them. We then extract the data into graphs on our dashboard, and use it extensively for Nagios alerting as well. There's a basic sample client included in this repository that we use for those purposes, though you're right, it just gets you raw numbers out of the box.

We're planning on releasing more of both the "emitters" that gather data, as well as a major part of our graphing/dashboard interface "soon".

And point well taken about making it more obvious how to get started and what you can use it for. I'll work on improving the documentation.

2 comments

Could you explain briefly why you chose to write a replacement for statsd, rather than improve on it? What aspects of statsd were you not happy with?

(I don't have a horse in this race, I haven't used statsd before -- but I am planning to deploy some sort of statistics gathering soon and I wonder why I would choose your implementation over Etsy's, apart from the obvious appeal of the 37signals brand.)

Briefly, probably not (everyone here at 37signals got treated to a 3000 word treatise on our statsd journey a few weeks ago). I did write up a few reasons at https://github.com/noahhl/batsd/blob/master/doc/why-not.md.

In short: we as a company have a ton of Ruby experience and comparatively little Python/Node.js experience (both in terms of understanding the tools that we use, which we like to do, and simply just in being able to confidently manage dependencies, etc.), and we knew we were going to want to build our own UI eventually anyway, which limited the utility of Graphite itself.

Edited to add: I can't say it enough, Etsy and Graphite are both fantastic pieces of engineering, with fantastic communities and support behind them (there's a fascinating writeup about Graphite in particular at http://www.aosabook.org/en/graphite.html).

I briefly read the chapter on persistence -- basically you're doing what RRD originally did (one file per metric), except without actual round-robin storage, before RRDcache was born. The long-term performance implications could be worrisome. Unless you're backing this with solid-state storage, if you have many thousands of metrics, the seek capability of the disk may not be able to keep up with the I/O flush rate.
You're witnessing second degree dilettantism at work.

Remember, we started out with a rock-solid reference impl called RRDTool. RRDTool is 13 years old and about as mature as it gets. It's also surprisingly usable and relatively wart-free.

However, its documentation is not written as a narrative "guide", so inevitably some kid eventually found it too complicated and decided to reinvent it, without realizing how far out of his depth he went. That's how graphite happened.

Now 37signals sees graphite, and goes full Dunning Kruger with yet another knock-off, this time leaving out everything that would acknowledge the slightest understanding of the problem domain. While graphite at least tried to mimic the RRDTool file-format 37signals just skips over that whole "complicated binary-stuff" and writes the data as newline-delimited ascii-text...

I believe Graphite/Whisper were created to address some inabilities in RRDTool: http://readthedocs.org/docs/graphite/en/latest/whisper.html#...

Are you saying that graphite is somehow deficient? How is/was the author "out of his depth"?

While graphite at least tried to mimic the RRDTool file-format 37signals just skips over that whole "complicated binary-stuff" and writes the data as newline-delimited ascii-text...

What benefit lies in trying to mimic RRDTool's file format?

Scalability.
That makes sense, and speaks to me (I'm more of a Ruby guy myself.) Thanks for taking the time to reply.

Edit: and, "it looked like it would be easy" made my day :-)

Thank you, that's very helpful. Looking forward to the emitters and dashboard whenever they're ready - I suspect they will help drive adoption of Batsd and encourage development of additional emitters.
Definitely. There's a teaser screenshot of Flyash (the big, reusable chunk of our dashboard) towards the bottom of http://37signals.com/svn/posts/3091-pssst-your-rails-applica... (that post also details some of the major emitter components we use).
"457,739 different metrics in Flyash" Oh...kay... Flyash looks very nice from the screenshots, but I'm interested to learn how you solve the discoverability problem with that much data. (Too much of a good thing?)
Looks great, can't wait to see it open sourced. I've been dealing with the clunkiness of statsd/data -> graphite -> graphene for a dashboard, and have more than a handful of times almost started writing exactly what it looks like you already have done.

Any idea when we'll be able to use/contribute to Flyash?

A couple of weeks, probably, depending on how much I feel like working on it. It was designed to be modular and easily extracted, but still needs some cleanup work and has a few nasty bugs I'd like to fix first.