Hacker News new | ask | show | jobs
by foobarbazetc 5334 days ago
No shit, nmongo.

Anyone with half a brain can go look at the MongoDB codebase and deduce that it's amateur hour.

It's start up quality code but it's supposed to keep your data safe. That's pretty much the issue here -- "cultural problems" is just another way of saying the same thing.

Compare the code base of something like PostgreSQL to Mongo, and you'll see how a real database should be coded. Even MySQL looks like it's written by the world's best programmers compared to Mongo.

I'm not trying to hate on Mongo or their programmers here, but you've basically paid the price for falling for HN hype.

Most RDBMSes have been around for 10+ years, so it's going to take a long, long time for Mongo to catch up in quality. But it won't, because once you start removing the write lock and all the other easy wins, you're going to hit the same problems that people solved 30 years ago, and your request rates are going to fall to memory/spindle speed.

Nothing's free.

3 comments

I think the discussion here also misses an important aspect of the conversation which is about application data modeling. Mongo will sooner or later reach a "stable" level as it matures just as mysql, postgres and all other datastores have done. I picked mongo due to the good fit it had to the problems I needed solved not only from the server perspective but from the modeling perspective. The ease of ad-hoc queries and the schemaless nature of the db lent itself well to the kind of problems I wanted to solve.

So even if in 30 years it's got the same characteristics as our current dominant data storage models I consider it a net win that I will be able to use a document oriented database for development over a more traditional RDBMS for some off my applications.

The richer our toolset is the better we are off as not every problems is a nail to be hammered in with an RDBMS.

So a high five to all the people who dare go against convention and take a chance on a new approach to data modeling being it Mongo, Riak, CouchDb, Redis, Neo4j, Cassandra, HBase or any other awesome opensource project out there.

These are not new approaches to data modelling.

Document databases, network databases and hierarchical databases (IMS, CODASYL etc) predate relational databases by decades.

Relational is the universal default for a simple reason. When first introduced it proved to be far better, in every conceivable way, than the technologies it replaced.

It's as simple as that. Relational is a slam-dunk, no-brainer for 99.99% of use cases.

Still, if you really want a fast, proven system for one of the older models, you can get IBM to host stuff for you on a z/OS or z/TPF instance, running IMS. It'll have more predictable performance than AWS to boot.

I agree entirely - I think when people rebel against "relational databases" they're actually just realizing that the normalization fetish can be harmful in many application cases.

You're better off with MySQL or PostgreSQL managing a key-value table where the value is a blob of JSON (or XML, which I've done in the past), then defining a custom index, which is pretty damn easy in PostgreSQL. Then you have hundreds of genius-years of effort keeping everything stable, and you still get NoSQL's benefits. Everybody wins.

Normalization is a tricky thing. On one hand, highly normalized databases have better flexibility in reporting, IMHO. On the other, you lose some expressiveness regarding data constraints. High degrees of normalization would be ideal if cross-relation constraints were possible. As they are not, typically one has to normalize in part based on constraint dependencies just as much as data dependencies.
First, the more I have looked, the more I have found that non-relational database systems are remarkably common and have been for a long time.

The relational model is ideal in many circumstances. However, it breaks down in semi-structured content, content where---parentheses for grouping---(hierarchical structure is important, data is seldom written and frequently read, and where read performance navigating the hierarchy is most important) and so forth.

So I'd generally agree, but not every problem is in fact a nail.

> However, it breaks down in semi-structured content, content where---parentheses for grouping---(hierarchical structure is important, data is seldom written and frequently read, and where read performance navigating the hierarchy is most important) and so forth.

Again, this problem is not new. Database greybeards call this OLAP and it's been around since the 80s.

There is nothing new under the sun in this trade.

No. I am talking about something like LDAP, not OLAP. LDAP may suck badly in many many ways but it is almost exactly not like OLAP.

OLAP is typically used to refer to environments which provide complex reports quickly across huge datasets, so a lot of materialized views, summary tables, and the like may be used (as well as CUBEs and the like). Hierarchical directories are different. In a relational model you have to transfers the hierarchy to get the single record you want and you are not aggregating like you typically do in an OnLine Analytical Processing environment.

This is why OpenLDAP with a PostgreSQL backend sucks, while OpenLDAP with a non-relational backend (say BDB) does ok.

I am not saying anything new is under the sun, just that some of the old structures haven't gone away.

I was referring to the read/write preponderance. Normalisation optimises write performance, storage space and also provides strong confidence of integrity. But it means lots of joins, which can slow things down on the read side.

That's why OLAP came along. Structured denormalisation, usually into star schemata, that provide fast ad-hoc querying. I think part of the enthusiasm for NoSQL arises because most university courses and introductory database books will go into normalisation in great detail, but OLAP might only get name checked. So folk can get an incomplete impression of what relational systems can do.

If I had a purely K/V data problem -- a cache, for example -- I would turn to a pure K/V toolset. Memcache, for example.

Hierarchical datasets have long been the blindside for relational systems. Representable, but usually requiring fiddly schemes. But in the last decade SQL has gotten recursive queries, so it's not as big a problem as it used to be.

> When first introduced it proved to be far better, in every conceivable way, than the technologies it replaced.

That's not exactly true; what they did was offer a generic query and constraint model that worked well in all cases while offering reasonable performance. They were not generally faster in optimal cases, but they were much easier to query especially given new requirements after the fact because the queries weren't baked into the data model itself. That generic query ability and general data model always come at the cost of speed; always. Document databases have always been faster in the optimal use case.

You're absolutely right -- RDBMSes were designed to solve problems with the nosql-type approaches that preceded them. The nosql bandwagon is blindly rolling into the past, where it will crash into the old problems of concurrency and consistency under load.

BTW if you want nosql-style schema flexibility within an RDBMS, then a simple solution is to store XML or JSON in in a character blob. Keep the fields you need to search over in separate indexed fields. If you make incompatible version changes, then add a new json/xml field.

Another solution is to use the hstore feature in postgres to store key value data.
> BTW if you want nosql-style schema flexibility within an RDBMS, then a simple solution is to store XML or JSON in in a character blob.

In all sincerity, I would strongly recommend against this. If your problem absolutely defies normalisation, don't use a relational database.

very true but it's a resurgence of modeling alternatives which can only help to enrich our ability to write interesting applications. yes you can model a social network in a RDBMS but it's not as efficient or as flexible as using neo4j. or yes you can model a key value document in a RDBMS but again it's not a good fit. The right tool for the right problem. You don't build a house with only a hammer so why should we build applications only on one storage concept ?
I've opened the source code (at Github), but didn't really understood it. The code seems readable, though.

Do you care to provide some examples for those not familar with proper C++/Boost development practices, please?

I'm curious and I might be missing more than half of my brain. Would you be willing to show some examples of bad coding on their source tree?
I looked at using BSON in a project a while back, and ended up scrapping it mainly due to perceived poor code quality. Plenty of potential errors ignored, unclear error messages, unsafe practices.

I was also turned off by the sloppy use of memory. Heap allocated objects returned from functions with poor checks to see if anyone manages that memory on the other side. Lots of instances of strcmp, strcpy and similar unsafe string/buffer manipulation functions.

It's been a while since I looked at it so I don't have any particular examples at hand, but that was my impression.

I haven't ever used MongoDB but got interested, and first non-trivial source file I picked is this: https://github.com/mongodb/mongo/blob/master/db/btree.cpp

Take a look at for example: bool BtreeBucket<V>::find

Without even thinking about what it is doing, it's quite clear that it is not readable code, and it's not immediately obvious what the high level structure of the logic is. The function does not even fit into two screens so it's hard to reason about; your short-time memory is overused.

this is the implementation of a b+ tree. the underlying logic has been very well researched since the 70s.

if there is a part of mongodb that I am sure does not contain bugs, it is that very file you link to.

if you want to know what it does, go out and read the relevant papers on data base technology. or graduate in CS.

Clearly you didn't actually read the source file. I graduated in CS. I know B+ trees.

I also know that an 85-line, 7-argument method in a 1988-line file shouldn't depend on a global variable ("guessIncreasing") modified from several other, unrelated functions. I know that in bt_insert, which (apparently) assigns to "guessIncreasing" and then resets it to false just prior to exit, should be using an RAII class to do so instead of trying to catch every exit path, especially in a codebase that uses exceptions.

This code is amateur hour.

Thanks for attacking me personally. But I have no interest to pursue it more. I made claims that clearly hold true, and they have nothing to do with what you said (I did not say anything about bugs, for example)
That is characteristic of mathematical code, like btree. (ranty aside: being able to recognize this and find out information regarding btree for maintenance is(should be) one of the key reasons to get a CS degree)

I found the btree file relatively readable. Some macro stuff is not familiar to me, but I am sure I could figure it out in a few hours if I felt like. And I haven't yet rolled around to implementing a full-on btree, ever.