|
|
|
|
|
by physcab
4886 days ago
|
|
This is pretty cool but I'm struggling to see what the use cases are, atleast for analysis. There might be quite a bit of benefits for running application code that I'm not aware of. With regards to analysis though, their own example question is "what happened last night?" but then they go on to say that it is a near real-time data store. Does it matter that it is a real-time mirror then? I've always liked the paradigm of doing analysis on "slower" data stores, such as Hadoop+Hive or Vertica if you have the money. Decoupling analysis tools from application tools is both convenient and necessary as your organization and data scales. |
|
PostgreSQL scales surprisingly well for this purpose, and is much nicer for interactive queries than Hadoop/Hive. We use Impala[1] for some larger datasets, but Impala is comparatively new, and it's nice to have something as battle-tested as postgres here.
As for the "why do we need realtime?": In my mind the benefit of a near-realtime replica is not that you actually often need it, but that it means you never have to ask the question of "Was this snapshot refreshed recently enough?", and never end up having to wait several hours for an enormous dump/load operation, when you realize you did need newer data.
[1] http://blog.cloudera.com/blog/2012/10/cloudera-impala-real-t...