Hacker News new | ask | show | jobs
by axionike 4515 days ago
ES has performed very well for us as the backbone for the solution we deployed for a large government-sector customer. Had some GC issues initially, and were worried about user concurrency, especially since we were not restricting queries (i.e. users can do full-scale wildcard searches against the entire data set of 1BN+ records). But ES continues to shine.

Congrats to the ElasticSearch team, and all the supporters around it. Once I get back into more of a coding role, I'll definitely be contributing back to the ES project.

1 comments

This may require a bit more lengthy answer than makes sense here, but I'm curious about what was causing your GC issues and how you fixed them (we have GC issues at the moment).
Not the OP, but GC issues in Elasticsearch basically boil down to memory pressure (obviously), which is usually caused by facets. Facets eat a lot of memory, especially if you are faceting high-cardinality fields - think fields like "tags" or any analyzed field. High cardinality, analyzed strings is the easiest way to blow out the heap.

There are other reasons, but that is like 90% of GC issues. To solve it, you need to make sure your faceted fields are configured well (usually not_analyzed) and assess how much memory is available. You may be able to index and even full-text search ten billion docs on a single machine, but faceting it may just be too much to ask for a single node.

Omiting norms, disabling bloom filters on old indices and enabling doc values are other ways to help alleviate field-data pressure.

Other GC culprits can be: too large bulk requests, unbounded threadpool queues, or something like parent/child/scripts/filter cache keys eating all your memory. Also don't go above 30gb heaps, the JVM becomes unhappy :)