Hacker News new | ask | show | jobs
by gfodor 4883 days ago
FYI you can store unstructured data in PostgreSQL (and query it) with the introduction of hstore. So knock one more reason to use MongoDB instead of PostgreSQL off your list. (Disclaimer: the length of my list to use MongoDB has always been a constant that is less than one.)

http://www.postgresql.org/docs/9.1/static/hstore.html

2 comments

Wow, hstore really isn't a great alternative to an actual document DB. The "better" Postgres option would be a JSON type and functional indexes.
There is a JSON type but it just validates content.

HSTORE can be fully indexed (gIST and GIN). Just have to roll your own object graphs for nesting if that's what you need to do.

I swear I have typed this exact same comment previously. Deja vu, maybe

JSON type gives you some typed values within the doc, multi-level nesting, etc. You can add functional indexes (http://www.postgresql.org/docs/9.1/static/indexes-expression...) to index specific attributes within the JSON, do legit sorts over values, reasonable array queries, etc. It seems much, much closer to what Mongo does than anything you can do with hstore.
I think you just restated my comment. Do you believe expression indexes do not apply to HSTORE?

I consider both HSTORE (key/value) and the current JSON type and record functions are just intermediate steps to a fuller API [0].

[0]: http://www.postgresql.org/message-id/50EC971C.3040003@dunsla...

> JSON type and functional indexes

Those "Indexes on Expressions" are really a great feature that can also be combined with XML (not just JSON) and any other types. I recommend everyone to have a look at those:

http://www.postgresql.org/docs/9.2/static/indexes-expression...

Is there any way in those expressions to parse JSON and perform arbitrary calculations - i.e. like CouchDB views?
Sure, you can write server side procedures in f.ex. javascript that do arbitrary things with the json.
To be fair, one should note that "only" the languages C, Python, Perl and Tcl [1] are officially supported by PostgreSQL. Also, there are 3rd-party bindings for other languages such as Java, PHP, R, Ruby, Scheme, sh:

http://www.postgresql.org/docs/9.2/interactive/external-pl.h...

However, the PostgreSQL documentation doesn't mention JavaScript support anywhere. Are you sure there exists mature PL/JavaScript binding for PostgreSQL? If so, their docs should be updated.

[1] There's also "pgSQL", but that's a special-purpose language you won't find outside the database world. I don't recommend learning it unless you have strange requirements that make PL/pgSQL a perfect fit. For normale usage, use PL/Python or PL/Perl. In simple cases, use SQL directly.

Yes, #1 hit when you Google for postgresql JavaScript: https://code.google.com/p/plv8js/

I can't vouch for any particular maturity level but seems to have active users and it's been around a few years already.

> So knock one more reason to use MongoDB instead of PostgreSQL off your list.

One of the reasons MongoDB is so popular is because it is an fantastic database for developers. As a Java developer I can deal in my code with sets, hashmaps, embedded structures and have it effectively map 1-1 in the database. It's akin to an object database meaning you can focus higher up in the stack.

With the SQL ORMs you can't avoid having to deal with the ER model.

Of course, the problem with that approach is you don't have anything enforcing any sort of data integrity below the application. In my experience most of the time you actually can put down on paper a schema and a set of rules the data should obey without too much fear of it changing dramatically. The nice thing about hstore is it allows you the flexibility to introduce unstructured data in just the places where a schema is unknowable or not worth the complexity.

MongoDB et all basically are built around the assumption that a schema is never worth the complexity. It's a bold claim that contradicts many decades worth of database research.

> MongoDB et all basically are built around the assumption that a schema is never worth the complexity. It's a bold claim that contradicts many decades worth of database research.

Unless MongoDB et al are saying "always use MongoDB et al and never an RDBMS", then I'm not sure how you arrived at the conclusion that "the schema is never worth the complexity."

If anything, the appropriate assumption is, "schemas aren't always worth the complexity." When they are, you use an RDBMS. When they aren't, you don't bother with the data integrity constraints.

The "right tool for the job" mantra often cited whereby you run N different data stores for different use cases heavily discounts the true implication of running multiple data stores: you have to run multiple data stores. You have more ways to get burned by your lack of expertise. You need more eyeballs for the same amount of confidence in your system since those will probably need to be different types of experts. You need to know how to monitor them and tune them. Discussion about which data store to use for a given use case becomes a constant drag on discussions. There is less consistency in modeling since you have to work with multiple paradigms. Your software needs to be built to be able to deal with multiple data stores. All your export/import/backup/etc software efforts that are 1-to-1 with each data store need to be multiplied.

The bottom line is if you drop in a second data store because you have a few fields in your database that are a pain to model with a schema, you are doing yourself a disservice compared to just doing ALTER COLUMN foo hstore.

My colleague mcfunley wrote an article about this blind spot when people talk about these issues:

http://mcfunley.com/why-mongodb-never-worked-out-at-etsy

While I agree that is often a blind spot, it is a red herring to this statement made by you:

> MongoDB et all basically are built around the assumption that a schema is never worth the complexity. It's a bold claim that contradicts many decades worth of database research.

You may well argue that if you have N-1 applications using PostgreSQL, and the Nth application could---on its own---justifiably use MongoDB, then it is still appropriate to use PostgreSQL in favor of not adding Yet Another DB Engine.

But that is nothing more than a specific case that is often ignored in the "best tool for the job mantra". It does not mean that schemas are never worth the complexity of an RDBMS.

All I'm saying is that you can't claim that a recommendation of MongoDB assumes schemas are never worth the complexity; you can only claim that the assumption is that they are sometimes not worth the complexity.

More generally, MongoDB makes no assumption that contradicts "years of DB research."

Please don't confuse problems with SQL ORMs with SQL itself. SQL stores are powerful, flexible, and quite easily queryable. MongoDB is only a good database for developers if it solves the problems that you need to solve in a way that causes no impedance mismatch.

And for the record, we use both a SQL store, Redis and MongoDB where the use case suits it where I work.

I am under no confusion. SQL ORMs all suffer from the same problems (which is forced by the underlying SQL model) that MongoDB does not.

And your whole "use the right tool for the right job" goes without saying. It's others who seem to be obsessed with this "SQL is perfect for everything" delusion.

I'm under the impression that Mongo is merely hiding away some complexities, instead of truly resolving them.
Can you be more specific about what you mean by that?
Some of the many mistakes I have made are that of being " lazy" and instead of creating some tables and some schema structure and all the related code I just serialized the stuff and stored it as a string in the database. A few month later or less a very simple be requirement arise and I have to filter by a thing inside the serialization string. Then I say ok, let's index separately the thing, or store it independently, or do two pass filtering, but none of these beats the very simple query I could have done if I had done the right thing in the first place. I hid the complexity away but it came back to me weeks later and then it really stunk: the choice is between many evils, migrate existing data, duplicate the thing, select by regexp, etc.
impression (Noun) 1. An idea, feeling, or opinion about something or someone, esp. one formed without conscious thought or on the basis of little evidence.