|
|
|
|
|
by nelhage
4881 days ago
|
|
(I wrote MoSQL) PostgreSQL scales surprisingly well for this purpose, and is much nicer for interactive queries than Hadoop/Hive. We use Impala[1] for some larger datasets, but Impala is comparatively new, and it's nice to have something as battle-tested as postgres here. As for the "why do we need realtime?": In my mind the benefit of a near-realtime replica is not that you actually often need it, but that it means you never have to ask the question of "Was this snapshot refreshed recently enough?", and never end up having to wait several hours for an enormous dump/load operation, when you realize you did need newer data. [1] http://blog.cloudera.com/blog/2012/10/cloudera-impala-real-t... |
|
I do agree that PostgreSQL would be nicer for interactive queries. Waiting for a M/R to spin off is a bit of a buzzkill.
With regards to your usecases, what sort of questions have you found yourself answering the most? Do you have analytics applications running off of this?