| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by unethical_ban 158 days ago

I work in infosec and several popular platforms use elasticsearch for log storage and analysis.

I would never. Ever. Bet my savings on ES being stable enough to always be online to take in data, or predictable in retaining the data it took in.

It feels very best-effort and as a consultant, I recommend orgs use some other system for retaining their logs, even a raw filesystem with rolling zips, before relying on ES unless you have a dedicated team constantly monitoring it.

5 comments

kentm 157 days ago

Do you happen to know if ES was the only storage? Its been almost 8 years, but if I was building a log storage and analysis system, then I'd push the logs to S3 or some other object store and build an ES index off of that S3 data. From the consumer's perspective, it may look like we're using ES to store the data, but we have a durable backup to regenerate ES if necessary.

link

lillesvin 157 days ago

Searchable snapshots in Elasticsearch can be backed by S3 and they perform very well. No need to store the data on hot nodes any longer than it takes for the index to do a rollover, and from then it's all S3.

link

toenail 158 days ago

Dunno, I've had three node clusters running very stable for years. Which issues did you have that require a full team?

link

PedroBatista 158 days ago

Even most toy databases "built in a weekend" can be very stable for years if:

- No edge-case is thrown at them

- No part of the system is stressed ( software modules, OS,firmware, hardware )

- No plug is pulled

Crank the requests to 11 or import a billion rows of data with another billion relations and watch what happens. The main problem isn't the system refusing to serve a request or throwing "No soup for you!" errors, it's data corruption and/or wrong responses.

link

toenail 157 days ago

I'm talking about production loads, but thanks.

link

pixl97 157 days ago

Production loads mean a lot of different things to a lot of different people.

link

unethical_ban 158 days ago

To be fair, I think it is chronically underprovisioned clusters that get overwhelmed by log forwarding. I wasn't on the team that managed the ELK stack a decade ago, but I remember our SOC having two people whose full time job was curating the infrastructure to keep it afloat.

Now I work for a company whose log storage product has ES inside, and it seems to shit the bed more often than it should - again, could be bugs, could be running "clusters" of 1 or 2 instead of 3.

link

xeraa 157 days ago

There are no 2-node clusters (it needs a quorum). If your setup has 2-node clusters, someone is doing this horribly wrong.

link

toenail 158 days ago

I'm not even sure "get overwhelmed" is a problem, unless you need real time analytics. But yeah, sounds like a resources issue.

link

yencabulator 154 days ago

> I work in infosec and several popular platforms use elasticsearch for log storage and analysis.

Storing logs in ElasticSearch is just stupid, as it does not preserve order:

https://logstash.jira.com/browse/LOGSTASH-192

link

1_1xdev1 157 days ago

You have to slap something durable and a queue in front of it.

Elastic’s own consultants will tell you this …

link

cyberpunk 157 days ago

Meh i run hundreds of es nodes, its gotten a lot more friendly these days, but yes it can be a bit unforgiving at times.

Turns out running complicated large distributed systems requires a bit more than a ./apply, who would have guessed it?

link