|
|
|
|
|
by gane5h
4012 days ago
|
|
We store our event stream data in Elasticsearch. Two features that made it appealing: * the ingest-side can be scaled up by adding more shards
* the query-side can be scaled up by adding more replicas
To compute rollup analytics, we make heavy use of Elasticsearch's aggregation framework to compute daily/weekly/monthly/quarterly active users.From my understanding Postgres has many of these features, but the distributed features of ES are killer! |
|
That said, one major downside to ES is that it's not schemaless. You can try to use the dynamic mapping system, but it will most likely just bite you eventually, since ES is strict about coercing data types. If your data isn't completely consistent, it will actually refuse to index it. Any changes made to your schema also requires reindexing. (For some reason ES can't do in-place indexing, despite supporting storing all the original data in the "_source" field.)
If your data isn't perfectly consistent, one way to work around the mapping problem is to append a type name to every field. So instead of indexing {"user_id": "3"}, you index {"user_id.string": "3"}. This means that if you get some input data where the user_id is an int, it doesn't conflict because it will stored in "user_id.int". You have to handle the inconsistency on the query end, but it's possibly better than micromanaging the index.