Hacker News new | ask | show | jobs
by dkhenry 4793 days ago
Off the top of my head this is a reimplementation of the following * SNMP * CollectD * Carbon * JMX * WMI * CMIP

And a whole host of other proprietary transports. So its cool and looks awesome, but what does it give me that the entirety of other monitoring protocols doesn't

3 comments

I'm not affiliated with OP, but I wrote perhaps the most similar OSS project, so I have some perspective here.

There's a bunch of things going on on your boxes (logs, jmx, syslog, etc), and you want to get them out in a useful unified format. You have to do some ugly things (e.g. parse rails logs for latencies), and then emit the data, preferably in some structured format that knows that render=17ms is a duration so that you can graph it.

They chose their own transport to speak between heka nodes, because it maps perfectly to their internal representation, but it looks like they are willing to speak any of those protocols you mentioned to the outside world. It's useful to do a limited amount of munging inside the hekasystem before sending the data to logstash, graphite, etc, so it looks like they spent quite a bit of time building a framework for that initial work, so you can move it as close to the edges as you'd like.

To me, the transport and/or protocol isn't interesting, it's that you have a flexible, lightweight agent that's also capable of doing pre-processing and rollups.

One of the driving motivations was simplicity for developers and get a reasonable out-of-the-box experience.

This comes from a couple things.

Go compiles to a single static library so you don't have to worry about having dozens of "the right" library installed on your machine. Grab the heka binary and run with it.

This greatly eases our operations work as we have fewer dependency conflicts to deal with when we push things to production.

That doesn't make a lot of sense. You don't have to write a monitoring software from scratch just because you want statically compiled bundled libraries. You can do that with any programming language.
How do you run Python, Java, Perl, Ruby, or any JVM language without an installed runtime?
Quite easily. All offer options for building standalone programs that don't need a pre-installed runtime.

You just copy them to some directory, run them and they work.

And some of them even support building native binaries (e.g Java through gcc).

Virtually nobody in practice uses any of these.† Java binaries are in practice JVM bytecode in classfiles. Python programs are run by the Python interpreter.

Go compiles to native code. Not only do you not need a preinstalled Go runtime on a target system, but there's very little advantage to even having one. The normal way of installing a Golang program is simply to copy the binary and run it. That's powerfully simpler than most other modern programming languages, with the obvious exception(s) of C/C++/ObjC.

Commenter downthread says the same thing, but let me add that we look at other people's Python/Java/Ruby programs professionally, and I can't recall a single client ever doing anything like this.

>Virtually nobody in practice uses any of these.† Java binaries are in practice JVM bytecode in classfiles. Python programs are run by the Python interpreter.

The "Virtually nobody" this is because the main use case for Python and Java are as server side languages (both) and scripting languages (Python). In those cases people are expected to have or to setup the appropriate runtime beforehand.

But for people who want to ship apps to end users (customers and consumers) with Java and Python, the bundling thing is very very common.

People using them in the end user space, regularly do it this exact way. For most of them, you don't even get to know what they use underneath.

Some examples:

- Dropbox (uses and bundles Python in the app).

- Vuze torrent client (previously Azureus and very popular in its prime) bundles a JRE (for when you don't have an installed one).

- LightTable is just a JS runtime bundled with Webkit as a standalone app.

Huh. Most of the JVM shops I've worked at deploy apps as a monolithic fat .jar, built by their CI system, rather than trying to manage libraries on classpath.
With the gigantic disadvantage of security updates requiring recompiling everything :(
Technically, you could use something like PyInstaller[1] to bundle the runtime and all libraries into an executable package. Practically, no one ever does that. :)

[1] http://www.pyinstaller.org/

Interesting so its more of a ease of use then a performance issue ? The numbers you quotes for performance seemed impressive.
We needed performance as well as simplicity.

We started by extending logstash, but our needs were more "we need a router" and logstash isn't meant to be a router.

Statically linking the world isn't trivial. For our existing Python code bases - how are you going to deal with third party libraries from PyPI?

Come by on #heka on irc.mozilla.org, we're kicking around in there.

> Statically linking the world isn't trivial. For our existing Python code bases - how are you going to deal with third party libraries from PyPI?

Depends on what you want?

You could freeze the pip-requires to always install the same version and use a virtualenv per application. This is basically the same as bundling everything together, it has all the benefits with the least amount of work.

You could use distribution packages for security, correctness and stability or even roll your own repository inside your infrastructure to absolutely control everything.

Finally you could just bundle everything manually by fooling around with the PYTHONPATH and putting all the dependencies in a single directory. This is kind of like improvising your own virtualenv, it's very hacky, but it can work.

Another member of the Heka team here. Yes, there are a lot of options for managing Python deployments. But none of Python's stories are as nice as "Here's a single binary, put this on every machine."
Exactly! I started playing with Go a few days ago, and immediately started thinking that this would be the perfect language to create something like LogStash, Flume or SplunkAgents in - and install to the machines that need to forward data to our centralised logging system.

It really bugs me that I have to have an Python interpeter on the frontend web machines (cause I would prefer not to have a C compiler there)..

SNMP -> shivers