Modeling Data in MongoDB vs. ArangoDB

Y	Hacker News new \| ask \| show \| jobs

	Modeling Data in MongoDB vs. ArangoDB (arangodb.com)
	41 points by spountzy 4240 days ago

7 comments

tedchs 4240 days ago

This actually looks pretty interesting. I appreciate their FAQ has a great answer to "What is ArangoDB and for what kind of applications is it designed for?" -- more projects need to offer this kind of statement. https://www.arangodb.com/faq

link

shangxiao 4240 days ago

I like this project and am keeping an eye on it, but tbh that answer doesn't really answer the question in a way that seems objective. It just says it's a "general purpose database offering all the features you typically need for modern web applications".

link

wiremine 4240 days ago

Does ArangoDB use the same storage strategy MongoDB does? From the FAQ:

"So how much RAM do you need? This depends on the size and structure of your data: Your application will access one or many collections (think of collections as denormalized tables for the time being). Once you open a collection the indexes for this collection are created in the RAM and the data is loaded into the RAM using memory-mapped files. If your collections are bigger than your RAM, the operation system will be forced to swap data in and out of the swap space."

I'm not an expert, but a lot of people seem to harp on MongoDB for this very reason. Does ArangoDB use the same strategy? If not, how is it similar/different?

link

bjerun 4240 days ago

In principle, ArangoDB behaves similarly to MongoDB here. Both are essentially "mostly-in-memory" databases in the sense that they hold the data in memory and persist it at the same time to disk via memory mapped files. This approach is good for performance and if you run out of RAM you ought to shard your data.

However, MongoDB often uses a lot of memory for the actual data, since its BSON binary format stores the names of the attributes with every single document. ArangoDB detects similar shapes of documents (see https://www.arangodb.com/faq#how-do-shapes-work-in-arangodb) and thus avoids this particular problem.

link

wormit123 4240 days ago

I have been bitten by this using MongoDB as well. The shape recognition of ArangoDB sounds very useful. If this works well, it would alleviate a problem that NoSQL solutions so far have in comparison to classical relational databases.

link

neunhoef 4240 days ago

Interesting article. An obvious reaction is to say: "In a document store, not all joins will be efficient in a sharding situation!". This is true, but certain queries involving joins backed by the right secondary indexes will indeed scale well, therefore one should not use this argument as a reason not to implement joins at all.

link

MillstoneX 4240 days ago

Can you give an example for a join between two different sharded document collections that can be executed efficiently on say 100 servers?

link

neunhoef 4240 days ago

Say you have one collection for your people (sharded over 100 servers, say) and another one for conferences (also sharded over 100 machines). Then you could hold the primary keys of all conferences a person attended in a JSON list stored in an attribute with the user. A query finding all people with last name "Jones" that have attended a given conference can now be executed efficiently by using a secondary index on the last name of people and performing a key lookup in the conferences collection. The latter only has to talk to one shard, if the conferences collection is sharded by key and can thus be done efficiently. Obviously, one needs a query optimizer that is aware of the distribution of the shards and the shard keys, but this is certainly doable.

link

rafekett 4240 days ago

just because the dataset is sharded doesn't mean that one query has to hit every shard. for example, suppose you're looking for documents with `parent_id = foo` and your sharding key is `parent_id`, then an intelligent query planner would only query one shard (the one that "foo" hashes to), and then this looks a lot like a join in an RDBMS. indeed, if you wanted to do (in RDBMS terms) a self-join to load the whole tree of documents rooted at parent_id = foo, and your sharding key were the root for each document, that query would only hit one shard with a. the trick is deciding which keys to shard on (and, in many cases, what other keys to shard on in redundant datastores that serve different types of queries).

link

neunhoef 4240 days ago

Right, you were quicker but are essentially saying the same thing as I said in my example.

link

MillstoneX 4240 days ago

Thanks for both your answers, this is really interesting indeed. I always thought that joins are a "no, no, no" in the NoSQL world. This opens up a whole lot of new possibilities. I will have to have a look at this ArangoDB thing...

link

Lerato 4240 days ago

Is there a rule of thumb, in which situation you would model your connection as foreign key and in which situation you would model it as graph? Or do you always use graphs?

link

rodeoclown 4240 days ago

Model your data first using foreign keys, then if you have performance issues with specific queries consider ways of optimizing those queries.

Materializing that data in a graph may be one of those optimization candidates.

link

Steve83 4240 days ago

I've found an interesting slide show: http://de.slideshare.net/arangodb/domain-driven-design-frosc... explaining how to model your data based on techniques from domain driven design.

You have identify your entities and value objects.

link

neunhoef 4240 days ago

Another good rule I tend to use is that if your queries will involve variable lengths of paths in your graph, it was probably a good idea to model using a graph. This is because another model would almost certainly need multiple joins, which can kill performance quite quickly.

link

Marc64 4240 days ago

I think if you are connecting the same type of objects (i.e. users) you should use graphs. If you have a 1:n relation between different types, you could as well use foreign keys. For n:m you again need graphs.

link

Philippos91 4240 days ago

Having a 1:n relation which you might want to annotated with, for instance, "type of relation" it is also feasible to use the graph model, as edges can carry attributes.

link

MillstoneX 4240 days ago

I like that ArangoDB can be extended by micro services. Does this not raise security concerns, because user code is executed on the DB server?

link

don71 4240 days ago

This is an argument one often hears. However, V8 is encapsulated quite well, since chrome has the same issue.

Furthermore, these micro services can actually improve security: You can implement your own scheme for authentication and authorisation on the document level and deploy it to the database. Then, if your application has various clients for different devices, they are all authorized in the same way by the same code. This leads to a simplification in app development and thus to more security, because there are fewer places to get right and the whole approach is less error prone.

link

aikah 4240 days ago

First time i've heard about ArangoDB ,and it looks quite interesting.

When did the project start?

Could the Foxx thing be an independent application ?

link

don71 4240 days ago

ArangoDB was only started in 2012 but many years of experience in developing special-purpose database solutions went into it. This is how the rapid evolution into a market-ready product was possible at all.

Foxx is designed as the extension framework for ArangoDB and so it does not really make sense to rip it out of the DB kernel. Furthermore, a lot of its advantages would vanish if it does not longer have immediate and rapid access to the data.

link

dang 4240 days ago

No astroturfing on HN, please.

link

shangxiao 4240 days ago

Can you explain how this is astro-turfing?

link

dang 4240 days ago

It's hard to know for sure, of course, but according to the data we look at, some of these comments appear promotional rather than organic discussion.

That's not to say this isn't a great database, and we admire anyone who's undertaking a hard project. But there are proper and improper ways to get attention on HN. This one appeared to cross a line, hence my comment.

link