| see to me, having at one point been responsible for maintaining an ES instance for logs (and exporters and all the other bits) I feel like the prices you pay in engineering hours and hardware costs to maintain all those indexes while keeping ES from absolutely melting down is way too high. I think grep is amazing but yes if you unleash it on 'all the logs' without narrowing yourself down to a time frame first or some other taxonomy is going to be slow. This seems like a skill issue, frankly. Also full text indexes for all the things are generally FASTER of course, but seconds/milliseconds? How much hardware are you throwing at logs. Most only go to logs in an emergency, during an incident and the like. How much are you paying just to index a bunch of shit that will probably never even be looked at, and how much are you paying for hardware to run queries on those indexes that will be largely idle. The problems with ES/Splunk for logs is that they were not designed for logs, so they are both, in my view, overkill AND underkill for the task. Full fuzzy text serch is probably overkill, the UI for the task of dealing with log data is underkill. (The cloud bills are certainly overkill) I'm currently doing platform engineering at a company in the top half of the fortune 500. Honestly, probably about 90-95% of the time when I'm helping a team troubleshoot their service on kubernetes I'm using the kubectl `stern` plugin (shows log streams from all pods that match a label query) and grep/sed/awk/jq if it's ongoing, it's just waaaaay more responsive. If it's a 'weird thing happened last night, investigate' task and I have to go to Kibana it's just a much worse experience overall. |
To search multiple TBs of logs, you need a single 40 $/month server containing an 8 TB SSD running sensible software/index algorithm.
I agree that ElasticSearch is bloated and needs undue engineering time. But it doesn't need to be that way.
For example Quickwit finds things subsecond.
It's a huge improvement when queries go from 10 minutes linear search to instant.
(Its index is still not perfect for me because it doesn't support fully simple exact prefix/infix search, but otherwise it does the job fast with few resources.)
> Full fuzzy text serch is probably overkill
Yes, I think most people don't need fuzzy search for log search. They just need indexed grep.
> I think grep is amazing but yes if you unleash it on 'all the logs' without narrowing yourself down to a time frame first or some other taxonomy is going to be slow. This seems like a skill issue, frankly.
Right, grep is not the tool for the job. It's neglecting all sensible algorithms that solve this problem. It's like saying "I don't use binary search, only linear search", and spend human effort to pre-select the range so that it's fast enough.
When you're searching for the rare bugs, you also can't just limit the the time frame.