| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by vannevar 5503 days ago
	Storm sounds great, but this post probably should have waited until it was actually open-sourced. As it is, it just comes across as naked self-promotion based on a technology that could for all we know be vaporware.

4 comments

nathanmarz 5503 days ago

(I'm the author of Storm)

Your criticism is totally fair. People have been curious about Storm so we wanted to provide a little bit of information about it. We'll have demos soon, and of course it will be open sourced within a few months.

If you're curious about our credibility, I think our other open source projects speak to the quality of software we produce:

https://github.com/nathanmarz/cascalog https://github.com/nathanmarz/elephantdb

link

SeoxyS 5503 days ago

I think people here are a little too harsh. Storm sounds like an amazing product and I can't wait to play with something like that. Right now, we run a bunch of cron jobs every minute with intense MapReduce queries on mongodb to generate relatively up-to-date analytics. Something like this would be immensely useful. (As well as Mongo's new 2.0 Aggregation pipeline features.)

Now, I agree that it's kind of a bummer we can't play with it right now, but the fact that you guys made this are are going to open source it is already awesome in itself.

link

yummyfajitas 5503 days ago

As a happy user of ElephantDB, I'd say people are definitely too harsh. Elephantdb is awesome - my company has completely replaced HBase with ElephantDB and MaryJane (a lightweight way of putting data into hadoop that we wrote, https://github.com/stucchio/MaryJane- ).

That said, I'd love to see some code released, even if it isn't ready for primetime.

link

gojomo 5503 days ago

Can you say anything about what made ElephantDB + MaryJane better than HBase for your workload? (Occasional batch loads that then need random reads but not random inserts?)

link

yummyfajitas 5503 days ago

Absolutely - I need batch loads, and random reads. The term "insert" is somewhat meaningless - I have random appends and a periodic mapreduce job compiles the randomly appended data into structured data to served via ElephantDB. The structured data requires random queries. In principle, HBase should have filled my needs completely. But in practice, I couldn't make it work.

Our HBase cluster (3 boxes serving 30 human oracles, each submitting data at a rate of 1 record every 5-10 seconds) choked frequently - i.e., it stopped accepting new records. Ultimately what I had to do is have the human data go into postgres and a cron job flushed that into HBase every half hour or so.

I'll emphasize that this is probably my fault. I'm not claiming HBase doesn't scale to 30 concurrent users - clearly Facebook demonstrates it can. But I couldn't figure out how to make that happen. HBase is a complex system and I make no claim of understanding it.

ElephantDB + MaryJane are simple. There is almost nothing that can go wrong - put together they probably amount to 5000 lines of code and have as many as 10 minimally interacting configuration options. The effort required to manage them is minimal - I had EDB working flawlessly in less than a day.

HBase is an enterprise tool. It works well if you are Facebook and can put a couple of people on maintenance duty. It's overkill if you are Styloot (my stealth mode startup, currently smaller than Backtype).

link

gojomo 5503 days ago

Thanks! So on each batched load, is the previous data rewritten with interleaved new data? Or is the key ordering such that's never necessary?

link

omakase 5503 days ago

Cool, great to see you're making use of EDB. Would love to hear more about how you're using, how the transition was, etc. mm@backtype.com.

link

nphase 5503 days ago

I'm still waiting on Twitter's rainbird (http://www.slideshare.net/kevinweil/rainbird-realtime-analyt...) to come out!

link

matclayton 5503 days ago

Me too, I asked them the other day about it,

Response - http://twitter.com/#!/kevinweil/status/73263430873792512

link

ora600 5503 days ago

Also, the lack of any scalability charts or diagrams of architecture is suspicious.

If you can't make it opensource, at least write a serious paper to support the claims. Like Google did for Big-Table.

A lot of people think their systems are scalable and fault-tolerant. Most are not. And from the information provided, we can't tell.

link

konsl 5503 days ago

We've released open source projects (most notably ElephantDB and Cascalog) in the past that are successfully used in production by us as well as other companies. You should check them out if you're interested in a measure of quality, though I understand your concern.

We're a startup — we're not going to write an academic paper supporting the claims in the post. Nevertheless, Storm's an exciting project many people are curious to learn more about; that's why we've written something about it now.

We have a demo coming soon, and Storm itself will be open sourced soon enough.

link

mceachen 5503 days ago

I absolutely understand the issue of being resource constrained.

It seems like this is buzz-worthy, (like http://mailchimp.com/omnivore/), but this pitch is nerd-focused, not potential-customer focused. If you pitch to nerds, you want a github link. If you pitch to potential customers, highlight the benefits that are now possible due to this innovation.

At least in our batch, we got drilled this repeatedly: Don't talk features. Talk benefits.

link

justincormack 5503 days ago

Dont call it "the Hadoop of" if it is not open source. Hadoop is notable as an open source project not actually a new idea...

link

bengl3rt 5503 days ago

Precisely. I read the whole post looking for a link to the source on github or something, and then the last sentence was just a huge letdown.

link