|
|
|
|
|
by tmcneal
5399 days ago
|
|
To be fair CouchDB is very explicit that to get any sort of performance, everything must be a view. "Ad-hoc queries" (i.e. queries that are written on the fly instead of uploaded as a view) are clearly stated as "for development only". Where CouchDB really falls flat is for write-heavy applications. The default configuration in CouchDB is to not reindex a view until it has been read. When a read occurs, any new data in a view that was added since the last read must be re-indexed by executing the map/reduce functions on that data. If you're writing frequently to CouchDB but not reading a lot (as in a data warehouse) the first query you run is going to be extremely slow, since it will need to run map/reduce on a lot of new data. CouchDB doesn't distribute work to multiple nodes like Hadoop, and I've found even simple reduce functions to slow down re-indexing by a factor of 10. I think CouchDB has settings now to update the index on commit, or you could always run a cron job to regularly query the view and force a reindex, but it's still going to be slow. BigCouch (https://cloudant.com/#!/solutions/bigcouch) might be a potential choice for data warehousing, since it advertises full compatibility with the CouchDB API but offers distributed map/reduce like Hadoop/Hive/etc. I haven't used it though. |
|