Storm sounds great, but this post probably should have waited until it was actually open-sourced. As it is, it just comes across as naked self-promotion based on a technology that could for all we know be vaporware.
Your criticism is totally fair. People have been curious about Storm so we wanted to provide a little bit of information about it. We'll have demos soon, and of course it will be open sourced within a few months.
If you're curious about our credibility, I think our other open source projects speak to the quality of software we produce:
I think people here are a little too harsh. Storm sounds like an amazing product and I can't wait to play with something like that. Right now, we run a bunch of cron jobs every minute with intense MapReduce queries on mongodb to generate relatively up-to-date analytics. Something like this would be immensely useful. (As well as Mongo's new 2.0 Aggregation pipeline features.)
Now, I agree that it's kind of a bummer we can't play with it right now, but the fact that you guys made this are are going to open source it is already awesome in itself.
As a happy user of ElephantDB, I'd say people are definitely too harsh. Elephantdb is awesome - my company has completely replaced HBase with ElephantDB and MaryJane (a lightweight way of putting data into hadoop that we wrote, https://github.com/stucchio/MaryJane- ).
That said, I'd love to see some code released, even if it isn't ready for primetime.
Can you say anything about what made ElephantDB + MaryJane better than HBase for your workload? (Occasional batch loads that then need random reads but not random inserts?)
Absolutely - I need batch loads, and random reads. The term "insert" is somewhat meaningless - I have random appends and a periodic mapreduce job compiles the randomly appended data into structured data to served via ElephantDB. The structured data requires random queries. In principle, HBase should have filled my needs completely. But in practice, I couldn't make it work.
Our HBase cluster (3 boxes serving 30 human oracles, each submitting data at a rate of 1 record every 5-10 seconds) choked frequently - i.e., it stopped accepting new records. Ultimately what I had to do is have the human data go into postgres and a cron job flushed that into HBase every half hour or so.
I'll emphasize that this is probably my fault. I'm not claiming HBase doesn't scale to 30 concurrent users - clearly Facebook demonstrates it can. But I couldn't figure out how to make that happen. HBase is a complex system and I make no claim of understanding it.
ElephantDB + MaryJane are simple. There is almost nothing that can go wrong - put together they probably amount to 5000 lines of code and have as many as 10 minimally interacting configuration options. The effort required to manage them is minimal - I had EDB working flawlessly in less than a day.
HBase is an enterprise tool. It works well if you are Facebook and can put a couple of people on maintenance duty. It's overkill if you are Styloot (my stealth mode startup, currently smaller than Backtype).
We've released open source projects (most notably ElephantDB and Cascalog) in the past that are successfully used in production by us as well as other companies. You should check them out if you're interested in a measure of quality, though I understand your concern.
We're a startup — we're not going to write an academic paper supporting the claims in the post. Nevertheless, Storm's an exciting project many people are curious to learn more about; that's why we've written something about it now.
We have a demo coming soon, and Storm itself will be open sourced soon enough.
I absolutely understand the issue of being resource constrained.
It seems like this is buzz-worthy, (like http://mailchimp.com/omnivore/), but this pitch is nerd-focused, not potential-customer focused. If you pitch to nerds, you want a github link. If you pitch to potential customers, highlight the benefits that are now possible due to this innovation.
At least in our batch, we got drilled this repeatedly: Don't talk features. Talk benefits.
Your criticism is totally fair. People have been curious about Storm so we wanted to provide a little bit of information about it. We'll have demos soon, and of course it will be open sourced within a few months.
If you're curious about our credibility, I think our other open source projects speak to the quality of software we produce:
https://github.com/nathanmarz/cascalog https://github.com/nathanmarz/elephantdb