Hacker News new | ask | show | jobs
by moonshinefe 4057 days ago
Yes, grepping logs is terrible if "you have 100Gb of logs a day". I'm not sure why the author is thinking his use case is anything near the norm or why he's shocked in most use cases people prefer text files.

I'm also not getting why he just doesn't use scripts to parse the logs and insert them into a database at that point. Why use some ad-hoc logging binary format if you're doing complex queries that SQL would be better suited for anyway, on proven db systems?

Maybe I'm missing something.

7 comments

Grepping is just fine if you have a few hundred megabytes of data a day, so wanting to kill text based logging, because YOU reached multiple gigabytes a day is going to be met with resistance from the people who don't have those issues.

As the author himself points out: "I'm sorry, but deciding how much and what we log is not your job. Its ours, and this is the amount we have to deal with."

That goes both ways. If I only have one or two servers, having to run a centralized logging services doesn't scale either, the overhead is not worth the trouble.

If I want to look for an IP in logs from multiple service, text files are perfect. Doing the same across multiple servers, yes, then you want centralized logging. Binary logging ruins the first case, while text based works in both (sort of).

I don't really see the point of binary logs. Either you're small enough that text files won't be an issue, or you're large enough to have centralized logging.

It seems that there's a push towards "scaleable solution" for everything, but people keep forgetting that you need to scale down as well. Most of us will never have to run more than a handful of servers, and in these cases the Twitter/Google/Facebook-like infrastructure just isn't worth the hassle.

I think I'm missing the same thing. He keeps going on about structure, but it wasn't obvious to me where the solution (?) actually introduces query-able structure.

He needs a log database, clearly. And when you put it that way, it's obvious why grepping logs is a nice, quick solution in many cases when you aren't getting "100Gb of logs a day".

I think his point was that querying a structured data is better than grepping unstructured text. SQL vs Regex, for example. I get the impression that he didn't state what solution to use but simply that binary/structured > text/unstructured. He even says that Journal isn't his ideal solution and never will be.
Throwing 100GB of data into a relational database and being able to run your queries quickly isn't exactly a no brainer
One of the initial challenges I see from an OPS perspective is that the most recent logs are often the most interesting. The latency of the logs being ingested into a DB would prevent me from using the DB. Generally, I find my self grepping logs on the prod servers.
"you have 100Gb of logs a day"

Logs have lots of redundancy, so they compress quite nicely. So it is actually practical to grep those files since on disk they are not so large, and 100Gb of memory data is not a problem to grep.

The author shows a use case for both a small and a large logging system. The use case is complex queries which spans multiple applications and don't need regex ninja skills but sensible queries.
he does not. His small logging system is not small at all, it spans multiple systems and has requirements that are not at all typical for small systems.
My small system is usually two computers. Having a router/proxy/firewall box at home is not all that uncommon, and some examples I gave apply there nicely.
Do you really think that your requirements apply? I think none do in case of really a small system. See, I think I understand your requirements, but I think they are not common at all.

For a small system - a desktop PC and maybe a custom router box - you do not need one central place for the logs. Thus you don't need an easy way to change it. You don't need to preserve logs in a more efficient way than logrotate does. They don't need to be stored more structured than the filesystem does, the queries are local, and grep is more than efficient enough.

Maybe a binary log is the best choice for you - it seems to be what you want. But that does not generalize to the general public. That is why the rant feels very misplaced for me.

You're missing the point. I'm not using a custom logging format. I'm using binary log storage, with emphasis on the storage. There is a database and a search engine behind it.

Logging format and log storage format are two very different things.

Also, I'm not shocked people prefer text files. I'm shocked why they're so much against binary log storage. There's an important distinction between the two: you can prefer text, if that fits your case better, without hating on binary storage.

> There's an important distinction between the two: you can prefer text, if that fits your case better, without hating on binary storage.

Except according to the article (which you posted and are defending all over this thread, so I'm guessing you actually wrote it?) the author has NO intention of honoring those who prefer text logs, in fact using the phrase "so vigilantly against text based log storage". To use your own reply, you can prefer binary, if it fits your case better, BUT DON'T HATE ON TEXT STORAGE.