Hacker News new | ask | show | jobs
Postgres, NoSQL, or other...
2 points by brittonrt 4984 days ago
I am writing an application which stores content (usually text, images, links, etc) and is highly relational (content can be associated with other content and queried as such).

An example would be fetching the 10 most recent submissions by a specific user, sorted by "rating" number which is also associated with a piece of content, as well as all content which has been related to each returned piece of content.

IE Bob requests the 5 most popular data points he's published and all the associated images/links/etc.

I am currently using MongoDB, and have had no issues, but after doing some research, I'm considering switching to an amorphous Postgres implementation where there are only 2 tables, a data table and a relationships table, with Memcached keeping some of the higher level requests precached for quick fetching. This should keep my schema from the application's perspective highly flexible.

I know it's apples and oranges, but would anyone be willing to offer maybe some of their own experiences with similar setups that might help me determine if this is a good decision? My main reason for looking into migrating away from MongoDB is write speed and data volatility. I don't want to ever lose data if possible, and I understand that Mongo has table level locking when writing (is this true still?)

If anyone has any advice on the performance/reliability implications of either or both options, or if anyone thinks there's an even better 3rd option I haven't considered, I would be very interested to hear!

Thanks so much!

2 comments

If you're using that kind of schema you won't be getting any of Postgres' advantages. Obviously it has a pretty good reliability reputation, but I'd be very cautious about switching to Postgres because you're going to be making a lot of work for yourself.

If you're worried about performance, benchmark it (or find someone who has). On the reliability side there are plenty of NoSQL options with better reputations than MongoDB. My first thought is neo4j, but I don't really know enough about your particular problem.

Thanks for the reply. From what I understand, the schema I described is similar to what Reddit uses in their Postgres setup. You mentioned that I "won't be getting any of Postgres' advantages", what are some of the most important considerations that play into Postgres strengths? What are the primary advantages you for Postgres your referring to?

Thanks!

I meant just in terms of Postgres expecting a "standard" relational schema; the interface and query planner and the like are designed around that. It's not a bad schemaless data store per se, but it's not really designed for that use case, so I would expect rough edges when using it that way.

The modern selection of schemaless data stores weren't as mature (if they even existed) when Reddit was getting started, so don't assume that they would choose that architecture if they were starting today. I suspect that if you build on postgres in this way you'll have to write significant pieces of functionality by hand that you get for free with mongodb/couchdb/redis/neo4j/etc.

That said I'll emphasise again that I don't know too much about your system; at some point, prototyping and measuring will be the only way to see what works best for you.

Just checked out neo4j, it looks pretty great, I'm surprised I hadn't heard of it before. Thanks for the heads up!
you should start from giving some numbers i.e. you want it to work for 10mil documents, how ofter they will be updated etc.

if it's less than 1mil and they are not updated frequently, you should go for postgres and have some fun with SQL,

more data and increased concurrency means problems also in postgres, a lot of optimizing, materializing views and other fun problems to solve.