| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sleepybrett 480 days ago

see to me, having at one point been responsible for maintaining an ES instance for logs (and exporters and all the other bits) I feel like the prices you pay in engineering hours and hardware costs to maintain all those indexes while keeping ES from absolutely melting down is way too high.

I think grep is amazing but yes if you unleash it on 'all the logs' without narrowing yourself down to a time frame first or some other taxonomy is going to be slow. This seems like a skill issue, frankly.

Also full text indexes for all the things are generally FASTER of course, but seconds/milliseconds? How much hardware are you throwing at logs. Most only go to logs in an emergency, during an incident and the like. How much are you paying just to index a bunch of shit that will probably never even be looked at, and how much are you paying for hardware to run queries on those indexes that will be largely idle.

The problems with ES/Splunk for logs is that they were not designed for logs, so they are both, in my view, overkill AND underkill for the task. Full fuzzy text serch is probably overkill, the UI for the task of dealing with log data is underkill. (The cloud bills are certainly overkill)

I'm currently doing platform engineering at a company in the top half of the fortune 500. Honestly, probably about 90-95% of the time when I'm helping a team troubleshoot their service on kubernetes I'm using the kubectl `stern` plugin (shows log streams from all pods that match a label query) and grep/sed/awk/jq if it's ongoing, it's just waaaaay more responsive. If it's a 'weird thing happened last night, investigate' task and I have to go to Kibana it's just a much worse experience overall.

1 comments

nh2 480 days ago

It should not take engineering time to have a database compute full-text indices. In sensible systems, you do "CREATE INDEX" and done.

To search multiple TBs of logs, you need a single 40 $/month server containing an 8 TB SSD running sensible software/index algorithm.

I agree that ElasticSearch is bloated and needs undue engineering time. But it doesn't need to be that way.

For example Quickwit finds things subsecond.

It's a huge improvement when queries go from 10 minutes linear search to instant.

(Its index is still not perfect for me because it doesn't support fully simple exact prefix/infix search, but otherwise it does the job fast with few resources.)

> Full fuzzy text serch is probably overkill

Yes, I think most people don't need fuzzy search for log search. They just need indexed grep.

> I think grep is amazing but yes if you unleash it on 'all the logs' without narrowing yourself down to a time frame first or some other taxonomy is going to be slow. This seems like a skill issue, frankly.

Right, grep is not the tool for the job. It's neglecting all sensible algorithms that solve this problem. It's like saying "I don't use binary search, only linear search", and spend human effort to pre-select the range so that it's fast enough.

When you're searching for the rare bugs, you also can't just limit the the time frame.

link

sleepybrett 477 days ago

I think we might be working at different scales a 40/month server with 8tb of disk would be a puddle on the floor in my current circumstances.

The problem is that many if not most applications have their own log structure so just saying 'index it' doesn't cut it at all.

link

nh2 464 days ago

I'm not sure I understand.

I was talking about what it takes to search through a couple TB of logs. I said that with grep and Loki it's slow due to the linear search, and that indexing systems make it much faster (from many minutes to subsecond).

That's independent of whether you have more than just a couple TB of logs. If you have more, you just get more servers. You'll still get the subsecond results that I find so beneficial.

link