Hacker News new | ask | show | jobs
by trengrj 1244 days ago
I'm not a fan of competitors creating benchmarks like this as when faced with any tuning decision, they will usually pick the one the makes their competitors slower. But anyway lets take a look at how they tuned Elasticsearch.

Disclaimer I used to work at Elastic!

- Used Logstash instead of Beats for simple task of reading syslog json data. Beats (https://www.elastic.co/guide/en/beats/filebeat/current/fileb...) would have performed better especially around resource usage.

- Set very low Logstash heap of 256mb https://github.com/SigNoz/logs-benchmark/blob/0b2451e6108d8f...

- Added grok processor https://github.com/SigNoz/logs-benchmark/blob/0b2451e6108d8f... Dissect is faster here

- No index template configuration This would cause higher disk usage than needed due to duplicate mappings. Again a Logstash vs Beats thing. For this test more primary shards and a larger refresh interval would also improve things.

- Graph complaining Elasticsearch using 60% available memory. This is as configured, they could use less with not much impact to performance.

- Document counts do not match.. This is probably due to using syslog with random generated data vs creating a test dataset on disk and reading the same data into all platforms.

- Aggregation queries were not provided in repo https://github.com/SigNoz/logs-benchmark so cannot validate.

I'm actually surprised Elastic did so well in this benchmark given the misconfiguration.

1 comments

thanks for the note. Our approach for this benchmark was to use the default configs which each of the logging platforms come with.

This is also because we are not experts in Elastic or Loki, so we won't know the possible impact of tuning configs. To be fair, we also didn't tune SigNoz for this specific data or test scenario and ran it in default settings.

> Graph complaining Elasticsearch using 60% available memory. This is as configured, they could use less with not much impact to performance.

This is something we discussed about, and have added a note in the benchmark blog as well. Pasting again for reference

> For this benchmark for Elasticsearch, we kept the default recommended heap size memory of 50% of available memory (as shown in Elastic docs here). This determines caching capabilities and hence the query performance.

We could have tried to tinker with the different heap sizes ( as a % of total memory) but that would impact query performance and hence we kept the default Elastic recommendation

Part of the issue is, Elasticsearch isn't an open-source logging platform--it's a search-oriented database. Effectively using it as an open-source logging platform highly depends on the config vs things optimized only for logs out of the box.

I imagine you'd have similar issues with Postgres or any general purpose datastore without the correct configuration.

I'm not an Elastic expert either, just a developer responsible for a lot of things that can Google pretty good, and I knew those configs seemed off. I've been hearing for years that Beats is preferable over Logstash. I don't even claim to work in the logging space :-)