Hacker News new | ask | show | jobs
by Cloven 5171 days ago
I think the Riak pages are not geared at the people making the selection, e.g., architect-level technologists. The wiki is full of marketing claims ('Riak is the most powerful open-source, distributed database you'll ever put into production') ('Riak is the most boring database you’ll ever run in production').

But then suddenly:

'curl -v -X PUT -d '{"bar":"baz"}' -H "Content-Type: application/json" \ -H "X-Riak-Vclock: a85hYGBgzGDKBVIszMk55zKYEhnzWBlKIniO8mUBAA==" \ http://127.0.0.1:8091/riak/test/doc?returnbody=true

now, I understand why that is the way that it is -- the Riak guys are simultaneously very proud of their product, and also extremely technical -- but it's missing the middle ground. The middle ground is where you explain the characteristics of the system in non-marketing terms, describe what it's good at and what one could reasonably expect it to do, and describe also where it fails horribly and what you should not try to use it for. Once that's been outlined, _then_ bring out X-Riak-Vclock.

By comparison, e.g., redis.io has simple pages describing every command in the system and, critically, the associated algorithmic complexity and discussion of likely issues and problems. It describes what performance expectations are likely to be achieved on commodity hardware. And it allows you to test out your thoughts in real time right there on the page.

Personally, I have very little idea how, e.g., Riak compares to Redis. And I built Erlang and Riak from source and did the tutorial. I don't have a sense for how many ops/sec Riak can manage, what the equivalent to sinterstore and sunion look like, what the minimal real architecture for a production box setup should be. And a lot of the tutorial fills me with The Fear.

Which brings me to the last point: The Fear of Riak is pretty strong, and that's because very few people are running erlang on purpose in production. A lot of developers (and certainly devops people) have a hard enough time with their existing stack, without bringing on an entirely alien software, logging, alerting, monitoring, managing, and developing stack, and trying to understand how to reason about it. And, even those developers who can work up the courage to dive into erlang will have to deal with the fact that they will be novices for quite a long time on an extremely technical product that is designed to be at the core of their world.

2 comments

redis.io has simple pages describing every command in the system and, critically, the associated algorithmic complexity and discussion of likely issues and problems.

This. A thousand times.

Every database should be required by law to have that kind of reference. With most of the NoSQL databases it's ridiculously difficult to figure out basic things like "does this support range queries at all?" or (god forbid!) "how would I implement range queries efficiently?". You can be lucky when they casually mention which concepts their DB was originally based on (bigtable somethingsomething dynamo something) to get at least a rough idea of what to expect.

Writing documentation is hard. But many of the contenders (including riak) are so overtly hostile that it seems almost deliberate.

I think the point you bring up about comparing the ops/sec between Riak and Redis is very interesting in multiple ways:

1. Riak has an in memory database mode (its one of the backends you might choose and you can run multiple backends simultaneously) but most people don't know about it- and I bet you were thinking of a comparison of disk based (though I think of Redis as an in memory database)

2. Its a typical expected question, but it also belies a profound ignorance about the nature of scalability (sorry, not calling you ignorant, just saying the culture of technologists is kinda ignorant.).... redis exists only on one node, while Riak is distributed.

Redis on one node vs Riak on 100 nodes is going to be one hell of a comparison in favor of Riak!

But everywhere you look people are doing benchmarks on single nodes. MongoDB doesn't scale in a homogenous distributed way, but people think its faster than Riak because its single node performance is higher (I presume.)

3. The people who are making these decisions do not understand what they are doing, I think. People who are afraid of Erlang because they have trouble managing their existing stack is like being afraid of a Volvo because their current car is unsafe. Stability and manageability is erlang's hallmark, but most people are kinda ignorant of this. (though of course it does work a different way than typical software.)

Not to say your points are not good, they are, but that I think there's a lot of education that is needed, and thus there is a gap that riak has to bridge.

How do you show your superiority to people whose prejudices or misunderstandings make them unable to recognize it?

How do you show your superiority to people whose prejudices or misunderstandings make them unable to recognize it?

By drawing a handful of very simple diagrams.

"This is what a database that you may know looks like". "And this is how our database looks in comparison".

And by elaborating with a HTML <table>: "Our Database" vs "Their databases".

It's not rocket science, really.

The people who are making these decisions do not understand what they are doing.

Yes, exactly -- to paraphrase Feynman -- you don't fully understand something until you "turn it over, to see why it's true, from every angle you can find." You don't understand all the benefits, the issues, the problems, the gotchas -- until you explore it from every angle, you're just taking someone else's word.

The problem with adoption of any new technology is there is no shortage of advocates making claims that you should use X for... but there aren't enough hours in the day to explore them all. Which ones are you going to take the leap of faith to learn, and why? The more foreign and ambiguous the technology, the bigger the leap of faith required.

Nirvana, good point about ops/sec not being a be-all end-all, but I'd continue to argue that you're looking at the problem the wrong way. As the technical director for my game company, my primary concern is not 'whether the database has an in memory database mode' or 'does the database have a distributed scalability solution', but 'will it do what I want?' tied with 'how much does it cost technically, organizationally, chronologically and monetarily to fulfil my requirement?'

That starts with, rationally enough, ops/sec. Because you would not believe the sheer number of databases or solutions that fall down horribly when you ask the ops/sec question. Anyone remember EJB 1.0? And the Riak answer should be "yes. We can do ops/sec. On 1 m1.large node, we do X,000 writes/sec and A,000 reads/sec. On 10 nodes, we do YY,000 writes/sec and BB,000 reads/sec. Our largest installation has QQQ nodes."

Then the question becomes "ok, how about complex queries that aren't just get/set?" And the Riak answer (note: I'm making these up, but you get the idea) is "well, our story continues to evolve, but first, understand that everything is a document, exposed to you as JSON over a web service. Then, we have a pretty complete map/reduce implementation which uses either Erlang or JS as the language of your choice. With a fairly complex query that hits thirty fields of a random complex document, across 3 nodes we do about X,000 queries/sec, and across 100 nodes we do about YY,000 queries/sec on m1.larges."

That sort of goal-driven questioning goes on for a while in a logical evaluator's mind. There are a lot of categories. "How do I support this thing? Ease of development? Do I have to know erlang? What's the first resource that your typical Riak installation runs out of first? What common design pattern or use case is totally crappy on Riak?"

I still like the idea of Riak quite a bit, but it's not enough to say that it has a scaling distributed solution and flowers and candy for everybody as a result, because that design makes it axiomatic that some use case will be pessimal. The database has been designed to fill some portion of the set of all database needs; which ones in particular is it best at, which is it less good at, and which is it not intended at all to address?

You show your 'superiority' (sic) to people whose prejudices (sic) make them unable (sic) to recognize it (sic) by talking like a reasonable adult at an appropriate level of abstraction.

(and by the way, redis can exist on more than 1 node. My facebook game involves 8 redis shards and can read-scale essentially infinitely, with a high degree of data safety)