Hacker News new | ask | show | jobs
by thedevopsguy 4077 days ago
Log analytics is a big topic so I'll hit the main points. The approach you take to logging depends on the analysis you want to do after the log event has been recorded. The value of the logs diminishes rapidly as the age of the events get older. Most places want to keep the logs hot for a period ranging from a day to week. After that,the logs are compressed using gzip or Google snappy compression. Even though they are in a compressed form they should still be searchable.

The most commont logging formats I've come across in production environments are:

1.log4j(java) or nlog(.NET)

2.json

3.syslog

Tools that I've used to search ,visualize and analyse log data have been:

1.Elasticsearch, Logstash and Kibana (ELK) stack

2.splunk (commercial)

3.Logscape (commercial)

Changes to the fields representing your data with the database approach is expensive because you are locked in by the schema. The database schema will never fully represent your full understanding of the data. With the tools I've mentioned above you have the option to extract ad-hoc fields at runtime.

Hope this helps.

3 comments

We're currently evaluating options, but for .NET Serilog is shaping up extremely nicely, and Seq/Logg.ly as log sinks are nice...

Seq is great because you can set up your own instance very near to your servers for low-latency/high-bandwidth logging, which really changes the game in terms of what you can feasibly (perf/financially) log. It also has some decent visualization options, and it's got some great integrations, with a decent plugin architecture to create your own real-time log processing code.

Logg.ly has some amazing GUI/search options.

We've been using Serilog/Seq and we're extremely happy with it. I'm a little surprised that you didn't mention the buzzword "Structured Logging", which is the special sauce that makes Serilog stand out. Instead of concatenating strings with values, you assign keywords to values which you can later search on. For example,

Log.Info("Customer# {customerNumber} completed transaction {transactionId}", customerNumber, transactionId);

Then using the Seq log viewer you can simply click on "transactionId" in the log line and filter by "transactionId = 456" or whatever. It's one of the most exciting advancements I've seen in the .Net logging world.

EDIT: I realized I didn't really answer OPs question regarding space. If you used Serilog, you can set up different sinks to export to, with different options. For example, you could send all your logs to mongodb, and just recent 1 week rolling logs to the Seq server.

When I've heard "structured logging" used, it has been in the context of much more key-value pairs than just having keywords next to values, e.g.

    Log.Info("customerNum={customerNumber} transactionId={transactionId} state=completed", cN, tID)
or the ever popular logstash-y format:

    Log.Info(LogState.Add("state", "completed").Add("customerId", customerId).Add("transactionId", transactionId));
where `LogState` would build up a key-value dict and its `ToString` would emit the logstash JSON format.

I guess the version that works best depends on the tool that is consuming the log text.

In the end, serilog, depending on the sink, makes your log look like the template, and attaches the meta data of your template variable names and replacement values to the message itself.
Are you able to use Serilog for metrics in addition to application events? I'm thinking something like average time for a method to execute, things like that. And if so, what tools do you use to comb through that data (to determine average execution times, for example).

Right now at work everyone just logs to a single CSV with an inconsistent format and it makes me cringe every time I look at it. It's also really difficult to parse.

I recently used the SerilogMetrics [1] NuGet package to determine the elapsed time between method calls. Worked great, although I couldn't figure out how to use my standard logging config that is carried in the static logging object and had to redefine the seq server I wanted those lines logged to in the class itself. This may have just been unfamiliarity on my part.

Your current way does sound like a headache. If your logging lines are in the standard NLog format, you should be able to drop in serilog without many changes.

[1] https://github.com/serilog-metrics/serilog-metrics

We've evaluated loggly, logentries, splunk, and Seq. The first 3 are fine depending on your logging needs. Seq can handle a TON of events thrown at it, and the latest stuff (~1 day old or so) is extremely accessible. The older stuff takes a little longer to search through though.
We're currently using Splunk (and may move to the ELK stack) for logging, but some types of "application events" are really more useful as metrics. We're using Ganglia for those metrics and limiting application logs to actions that are needed for audit purposes and for warning and error-level application problems.

Using a system like Ganglia (or the Etsy inspired statsd) is an important idea since the OP's original question included how to limit the size of logged data. These systems provide a natural way to aggregate data.

I'd recommend looking at Graylog. It uses Elasticsearch under the hood, surrounds it with an application that focuses on log management specifically. https://www.graylog.org/
Graylog is absolutely brilliant. Storing several TB of data in it, using it for alerting/monitoring (you can configure streams, which you can think of as constrained views of logging data), etc. Highly recommend it.
Is graylog free?
Yes.