Hacker News new | ask | show | jobs
by tmikaeld 1424 days ago
Comparing full-text search engines on queries that aren't full-text search are of course slow.

It would be interesting to see queries benchmarked for their intended workload (logs/sorting/full-text etc).

Fastest Avg: https://db-benchmarks.com/?cache=fast_avg&engines=elasticsea...

Slowest: https://db-benchmarks.com/?cache=slowest&engines=elasticsear...

1 comments

Elasticsearch is used not only for full-text search. It's widely used for analytics (aggregations) and filtering too, e.g. when you do log analytics.

Comparing Elasticsearch / Manticore with MySQL may be not the fairest thing since they are too different, but comparing them one with another and using not only full-text queries seems fine to me.

It all seems too good to be true, so I'd like understand more about the limitations.
Well for one it's written in c++, which means it is more likely to have memory safety bugs, which could potentially be security vulnerabilities.
While generally true, I would argue that for the use cases where full-text search is mostly used (e.g. either search through a public database, or, quite the opposite, an internal system that does search through logs collected from various sources), in practice security vulnerabilities are less of a concern because usually even if you can expose some data stored in the full text index using that vulnerability, it would still only expose data you could already find in that search engine and that's already accessible to you :).
That might be true in some cases.

But for the public data case, you probably still need to worry about DoS or data corruption.

In the logs case, a malicious actor can probably control at least part of the logs, so if a bug leads to arbitrary code execution, a bad actor could possibly get all kinds of valuable data.

Also, just to be clear, the language doesn't necessarily mean there are significant security bugs. A well written c++ app is probably better than a poorly written java app. It's just harder to avoid memory bugs in c++ than java.

I'd say (fulltext) search is one of the least interesting features of ES, and the aggregations are its USP. E.g. moving averages on (biggish) datasets can be calculated on very cheap hardware.
If you're not interested in fast full text search, then you're wasting a ton of resources on a solution that can be served way easier on more specially tuned analytics databases. The entire storage and retrieval methodology of these engines are based largely to do lucene style searches at extreme speeds.
What would be a good self-hostable solution to replace ES aggregations? I'm also quite fond of Clickhouse which is a lot faster yet, but the sheer number of products which have popped up in the last decade always makes me wonder if there's still faster solutions out there.
We're using a lot of clickhouse but there's influxdb (they apparently have a SAAS version now too), but you can look at druid or even the Hadoop family of products. The latter two are probably more in line with building entire analytics workflows instead of just a storage tier. If you're looking for more workflow managed solution, you can also look into apache flink which has a lot of similar uses
Thank you. I think I'll try to get the most out of Clickhouse, also because of the extremely ease of maintaining it and keeping it running.