Hacker News new | ask | show | jobs
by tmcneal 5399 days ago
To be fair CouchDB is very explicit that to get any sort of performance, everything must be a view. "Ad-hoc queries" (i.e. queries that are written on the fly instead of uploaded as a view) are clearly stated as "for development only".

Where CouchDB really falls flat is for write-heavy applications. The default configuration in CouchDB is to not reindex a view until it has been read. When a read occurs, any new data in a view that was added since the last read must be re-indexed by executing the map/reduce functions on that data. If you're writing frequently to CouchDB but not reading a lot (as in a data warehouse) the first query you run is going to be extremely slow, since it will need to run map/reduce on a lot of new data. CouchDB doesn't distribute work to multiple nodes like Hadoop, and I've found even simple reduce functions to slow down re-indexing by a factor of 10. I think CouchDB has settings now to update the index on commit, or you could always run a cron job to regularly query the view and force a reindex, but it's still going to be slow.

BigCouch (https://cloudant.com/#!/solutions/bigcouch) might be a potential choice for data warehousing, since it advertises full compatibility with the CouchDB API but offers distributed map/reduce like Hadoop/Hive/etc. I haven't used it though.

1 comments

Couch is definitely a lot more honest about their limitations than mongo or riak, but my experiences make me hesitant to recommend it to anyone not intimately familiar with those limitations.