Hacker News new | ask | show | jobs
by threeseed 2460 days ago
This is non-sensical.

Take a highly nested JSON document and try and implement it in a relational database. You would have an O(1) lookup in MongoDB and a O(n) lookup in MySQL. Or good luck traversing a graph structure in MongoDB when you should've used Neo4J. So in order to have a performant database you do need to "pay attention" and ensure that your access patterns suit the database type.

Also Web Scale is about any approach used at the bigger web companies. And the type of people who use the term are not the same people who would be sitting there optimising SQL queries. So I wouldn't conflate the two.

5 comments

> This is non-sensical.

Your examples are kinda proving yourself wrong though.

> Take a highly nested JSON document and try and implement it in a relational database. You would have an O(1) lookup in MongoDB and a O(n) lookup in MySQL.

In PostgreSQL (which is not MySQL, but it's a relational database) you can create an index for a column which contains JSON and the lookup for a nested field becomes O(1).

> Or good luck traversing a graph structure in MongoDB when you should've used Neo4J.

Facebook implemented the social graph on top of MySQL: https://www.facebook.com/notes/facebook-engineering/tao-the-....

> So in order to have a performant database you do need to "pay attention" and ensure that your access patterns suit the database type.

I think the point of the parent poster was that you don't need to ensure that your access pattern suite the database type, but that it fits well with the features the database provides. If you're looking at the database landscape today talking about a "database type" isn't really saying much at all.

That's because you took my examples and distorted them.

I specifically mentioned MongoDB and MySQL. Having a JSON type is basically a mini document database since it uses a different/enhanced query language and is often not relational with the rest of the schema.

I also specifically mentioned MongoDB and Neo4J. Yes you could implement a graph in MySQL but as your paper lists you will need to front it with a Memcache in order to cache the graph traversal lookups. With a graph database you generally don't need such a cache.

That is exactly it. Some SQL databases (CockroachDB and Postgres come to mind) can have the option of having O(1) JSON lookup. It just has to implement the correct algorithm, and index the data the correct way.

It is not about the database type. It is about the underlying algorithms and data structures.

Well, not really. It's about how you store the data. For instance all graph data can be stored as hexastore [0] which can be stored literally in any database and you will get similar-ish performance out of them.

Some databases are optimized for certain workflows, so for instance they will store only "pointers" to certain data rather than copying them. In addition their clients obfuscate and abstract the storage layer from user.

But you can really store any data in any database, you just might have to do some extra pre/post-processing and multiple the storage requirements.

[0] http://karras.rutgers.edu/hexastore.pdf

Technical note: big oh isn't a useful measure here. Most databases use b-trees (yes, even mongo) so lookups are at best O(log(n)). That goes for you and the people replying too.

The constant factors are way more important here. It's a 1000x factor difference depending on how durable you need your data to be (whether you need to write to disk or a quorum of network nodes in multiple regions). That is basically the only thing that mattered in the recent mongo vs postgres benchmarks.

When you say O(1)/O(n), what are you measuring the scale in relation to? Number of documents? Number of keys in the document you're looking for?