Hacker News new | ask | show | jobs
by rewmie 998 days ago
> 3) Switch out log4j for Kafka, which will handle the persistence & multiplexing for you.

I don't think this is a reasonable statement. There are already a few logging agents that support structured logging without dragging in heavyweight dependencies such as Kafka. Bringing up Kafka sounds like a case of a solution looking for a problem.

3 comments

> I don't think this is a reasonable statement. There are already a few logging agents that support structured logging without dragging in heavyweight dependencies such as Kafka. Bringing up Kafka sounds like a case of a solution looking for a problem.

If it's data you care about then you put it in Kafka, unless you're big enough to use something like Cassandra or rich enough to pay a cloud provider to make redundant data storage their problem. Logs are something that you need to write durably and reliably when shit is hitting the fan and your networks are flaking and machines are crashing - so ephemeral disks are out, NFS is out, ad-hoc log collector gossip protocols are out, and anything that relies on single master -> read replica and "promoting" that replica is definitely out.

Kafka is about as lightweight as it gets for anything that can't be single-machine/SPOF. It's a lot simpler and more consistent than any RDBMS. What else would you use? HDFS (or maybe OpenAFS if your ops team is really good) is the only half-reasonable alternative I can think of.

OK, but then how do you perform ad hoc queries on everything you logged to Kafka when it's time to debug an issue?

There are plenty of well known, battle tested solutions for solving that problem with old school logging.

> OK, but then how do you perform ad hoc queries on everything you logged to Kafka when it's time to debug an issue?

Again I'd say treat it like data you care about. Use your best guess at a primary identifier as the record key, depending on your data volume do some indexing/pre-aggregation around other facets that you think you might want to query on (which might include materialising everything in ksqldb, or even in some other datastore), and accept that occasionally you're going to have to do a slow full scan.

> There are plenty of well known, battle tested solutions for solving that problem with old school logging.

Splunk was just bought for $28B because none of those "well known, battle tested solutions" are any good. (Splunk also sucks! It just sucks a little less than the other options).

Do you want to debug what happened to your business entities, or do you want to debug what happened in your logs? Because if they're different things, those are different questions.

> There are plenty of well known, battle tested solutions for solving that problem with old school logging.

And you can run them in parallel (and without interference) by having them ingest from Kafka.

> There are already a few logging agents that support structured logging without dragging in heavyweight dependencies such as Kafka.

What are they? Because admittedly I've lost a little love for the operational side of Kafka, and I wish the client-side were a little "dumber", so I could match it better to my uses cases.

I think OP meant event sourcing.
> I think OP meant event sourcing.

That is really besides the point. Logging and tracing have always been fundamentally event sourcing, but that never forced anyone ever at all to onboard onto freaking Kafka of all event streaming/messaging platforms.

This blend of suggestion sounds an awful lot like resume driven development instead of actually putting together a logging service.

Hard disagree, Kafka is one of the simplest lowest maintenance tools for this with excellent language support and would probably be the first choice for anyone not paying $cloud_vendor for a managed durable queue.

The first step in building a reliable logging system is setting up a high write throughout highly available FIFOish durable storage. Once you have that everything else gets a lot easier.

* Once the log is committed to the durable queue that's it the application can move on secure the log isn't going to get lost.

* Multiple consumer groups can process the logs for different purposes, the usuals are one group for persisting the logs to a searchable index and one group for real time altering.

* Everything downstream from Kafka can be far less reliable because it's just a queue backup.

* You can fake more throughout then you actually have in your downstream processors because it just manifests as a lagging offset.

> Hard disagree, Kafka is one of the simplest lowest maintenance tools for this (..)

You sound like you've been using an entirely different project named Kafka, because the Kafka everyone uses is renowned among message brokers for its complexity and operational overhead.

I might be, it's one of the lowest touch services we run. But we aren't doing the "Kafka all the things" model where every single little app is hooked into it for generic message passing but simply logs go in, logs go out, nothing else.

The business logic message passing goes through Rabbit because we wanted out of order processing, priority routing, retry queues, blah blah.