|
|
|
|
|
by gpderetta
1721 days ago
|
|
I have little knowledge of the log aggregation domain, but generally indices are great for read mostly loads. It seems to me that for log aggregation writes are more frequent than searches; cheap writes and the occasional brute force search. For alerting you might better off running each new line against a set of filters/watchers. It seems wasteful to run it after indexing. Again, no experience or knowledge on the domain, so I might be completely off. |
|
Generally, you write and read to/from the same index in Elasticsearch. Where this falls apart is that you'll often want to change the configuration for an index based on whether it's write or read heavy. The main thing that changes in this scenario is the number of primary and replica shards (Lucene indices) for the Elasticsearch index.
Indices with a high write, low search workload will generally require more primary shards and less replicas. Low write, high search workloads require the opposite; lower primaries and more replicas.
The problem comes when you need high write and high search rates. Using a single cluster with lots of primaries and lots of replicas will overwhelm the hosts and you end up with terrible performance. The general pattern with Elasticsearch is to run two clusters. Index into one cluster, then use cross-cluster-replication (CCR) into a different cluster you run queries against.
There's an incredible amount of nuance to all of this. I've worked with many clusters and they all have different usage and configuration requirements. There's no magic formula for calculating configuration values; it all comes down to experience, monitoring, and experimentation.