Hacker News new | ask | show | jobs
by lmm 998 days ago
> I don't think this is a reasonable statement. There are already a few logging agents that support structured logging without dragging in heavyweight dependencies such as Kafka. Bringing up Kafka sounds like a case of a solution looking for a problem.

If it's data you care about then you put it in Kafka, unless you're big enough to use something like Cassandra or rich enough to pay a cloud provider to make redundant data storage their problem. Logs are something that you need to write durably and reliably when shit is hitting the fan and your networks are flaking and machines are crashing - so ephemeral disks are out, NFS is out, ad-hoc log collector gossip protocols are out, and anything that relies on single master -> read replica and "promoting" that replica is definitely out.

Kafka is about as lightweight as it gets for anything that can't be single-machine/SPOF. It's a lot simpler and more consistent than any RDBMS. What else would you use? HDFS (or maybe OpenAFS if your ops team is really good) is the only half-reasonable alternative I can think of.

1 comments

OK, but then how do you perform ad hoc queries on everything you logged to Kafka when it's time to debug an issue?

There are plenty of well known, battle tested solutions for solving that problem with old school logging.

> OK, but then how do you perform ad hoc queries on everything you logged to Kafka when it's time to debug an issue?

Again I'd say treat it like data you care about. Use your best guess at a primary identifier as the record key, depending on your data volume do some indexing/pre-aggregation around other facets that you think you might want to query on (which might include materialising everything in ksqldb, or even in some other datastore), and accept that occasionally you're going to have to do a slow full scan.

> There are plenty of well known, battle tested solutions for solving that problem with old school logging.

Splunk was just bought for $28B because none of those "well known, battle tested solutions" are any good. (Splunk also sucks! It just sucks a little less than the other options).

Do you want to debug what happened to your business entities, or do you want to debug what happened in your logs? Because if they're different things, those are different questions.

> There are plenty of well known, battle tested solutions for solving that problem with old school logging.

And you can run them in parallel (and without interference) by having them ingest from Kafka.