Hacker News new | ask | show | jobs
by thibaut_barrere 4993 days ago
MongoDB kicks ass in the following situations (real projects I did as a freelancer):

- dealing with semi-structured input (forms with some variability) and storing as a document, all while being able to query across the data

- used as a store to provide very flexible ETL jobs (with ability to upsert, filter/query, geonear etc)

For those situations, I would definitely use MongoDB again. As a RDBMS replacement, I wouldn't use it today.

4 comments

To slightly rephrase the OP's question:

  In what use cases does mongo kick postgres's ass?
To the two points you mentioned:

- semi-structured input can be saved as hstore type or as json type;

- and for flexible jobs, you can use pretty much any popular language - PL/R, PL/Python, even PL/C if performance is really critical.

I would have replied something similar if that was the question :-) (I use PG a lot these days).

Agreed on the first point (but I'm not sure you get exactly the same type of flexibility in all my use cases - I'll have to make a closer comparison).

For the second point, well not having to handle the schema for ETL jobs is sometimes fairly useful and removes a lot of cruft, that was part of my point (those ETL are code-based, only relying on MongoDB as a flexible store).

You can't query JSON easily and hstore is only one level deep. So, no, its not as flexible.
You can't query the json type as easily in postgres. I would guess that is an important use case.
I don't know about JSON type but the annoying thing about hstore is everything is a string; there are no types.
Just because you can write an app using assembler doesn't mean it is the best way to do it.
Very tempted to ask "In what way does postgres compare to an assembler and mongodb to a high level language?" but I think I'll just assume that you're trolling.

Besides, SQL sounds more high-level than map-reduce to me.

For those like me who have no ides what ETL stands for: https://en.wikipedia.org/wiki/Extract,_transform,_load
Thanks for pointing that out!

Here is a presentation (slides + video) I gave about a Ruby ETL, for instance. It illustrates the typical use cases I run into.

http://lanyrd.com/2012/rulu/swxtt/

To elaborate on the semi-structured input point: Monogo and it's kin are great for EAV systems (http://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%...), where your entities can have an arbitrary number of fields (often user defined). Trying to build this kind of system in a traditional RDBMs can be quite tricky.
Since we are a mysql shop, for this use-cases I serialize and store the form as xml in a CLOB column. For any field that needs to be searched on, I create an additional column. Disadvantage of this is you can't run mysql queries against the data stored in the CLOB column.
Adding column per searchable field is precisely what I wanted to avoid :-)

If you need to store multiple forms types, it gets hairy very fast.

MongoDB allows to query inside the data (which is not a blob in that case).

So does PostgreSQL, but I have no idea about MySQL. You can run xpath queries against XML documents, and you can also index specific xpath queries (there is no general indexing infrastructure for XML in PostgreSQL). For JSON no path lookup functions are included in the core yet, but I assume there are extensions which add them.
Thanks for elaborating! This is much needed and spot on :-)

A long while back I built a somewhat complex survey app: I can confirm it's fairly more complicated to handle with a RDBMs, compared to a document store.

fwiw, postgresql 9.2 supports a json datatype. you can efficiently access/query json fields using plv8 (http://code.google.com/p/plv8js/).

so you could have:

  create table form_results (                                                                                
    id serial primary key,                                                                            
    data json                                                                                         
  );
http://pgeu-plv8.herokuapp.com/ has more information.
It's just a JSON syntax validator. You can't index on part of the json, postgres treats it as a string.

You can apply full text search on it, but that doesn't tell you if you're matching on a key or a value.

In postgres, you can index on a stored procedure, so you can use a simple stored procedure written in JavaScript to look up by key and return values, etc. Example available at http://people.planetpostgresql.org/andrew/index.php?/archive...
Very interesting presentation, though the title of "heralding the death of nosql" is either intentionally exaggerating, or indicates the author doesn't understand all the reasons why people go to nosql databases. In fact, the presentation demonstrates why: Postgres has tons of really fantastic, awesome features that next to nobody uses because they are hidden behind layers of SQL-type incantations and/or require various extensions.
So you'd rather install a completely new storage engine and learn to use it than check a doc and install an extension to postgres?
The thing is that 10gen did a really, really good job at polishing the install process and documenting it to get people started.

No surprise to see the GIS part of MongoDB is built-in instead of an extension of some kind. I know a couple of people who used PG without even knowing there was a GIS extension.

But on the other hand PostGIS being independent from PostgreSQL has resulted in the best opensource GIS database. And with the recent addition of CREATE EXTENSION the PostgreSQL extension installation process has been heavily streamlined. Before CREATE EXTENSION it was a mess for larger extensions.
PostGIS is miles, miles ahead, for sure! Yet even "create extension" sounds a bit weird to most newcomers (me included at first!), especially against "built-in basic GIS".
is it a good thing to have everything built in?
I don't quite understand how the transition happened from no-json support to json support in those slides. How did the plv8js do this searching on fields?