Hacker News new | ask | show | jobs
by stephenlf 13 days ago
Can’t wait to see the next iteration of this idea with “Logs are all you need for durable workflows.”
6 comments

Wait no further. It's already happening.

One reason why a "logs are all you need" solution may fail: untrusted-log-as-injection[1].

Check those SBOM, and don't forget to include their CICD pipelines[2].

[1] https://news.ycombinator.com/item?id=48315440

[2] https://github.com/jqwik-team/jqwik/issues/708#issuecomment-...

In all seriousness, I’d take a “s3 is all you need for durable workflows” and use it in data processing applications that move data from s3 -> s3 with no other dependencies.
Yep. But we all know that one machine can and will fail (or be patched and restarted), so the log needs to be distributed.

Different workflows should probably go in different buckets or "topics" for clarity. Since it's distributed, the system must guarantee that the log items are stored in the same ordering ("offsets") among the nodes.

Not a bad way to do things.

Are logs all you need for durable workflows? I'm confused here. How'd persist and query nested or related data over logs? By logs I assume you mean something like elasticsearch or meilisearch?
Pretty much every durable system has an intent log of some sort. The log provides durability, the database system just integrates that log into a more queryable format.
I swear it didn't occur to me that that mean WAL, makes much more sense now LOL
I assume they meant a log like a WAL. A WAL should be (quite literally?) all you need for durable workflows.

A distributed WAL (to survive a machine death) would also probably be something I'd want, and … something I'm not sure you're getting directly from SQLite.

Is it common to use logs as a proxy for write-ahead logs?
Folks this is meant to be an honest question, not a snarky comment. I'm not a DBA, I'm DevOps/SRE and logs for me always meant execution logs. I'm just curious if between those involved in database domain logs is used to refer to WAL.
I think the original poster in this thread was joking. A fair number of databases use "logs" as a core mechanism for storing and sorting data. "Logs" in this context is not to be confused with stdout/stderr output that you may collect from a running program and forward somewhere like Cloudwatch/Elasticsearch etc. "Logs" in the context of databases here refers to the data structure; which can generally be defined as simply an append-only "file" (I put file in quotes because just because something is a log does not mean it is necessarily written to disk yet - that's why write-ahead logs exist). It's not just write-ahead logs.

Google "Log-structured merge trees" if you want an interesting read.

I read the parents comment as sarcasm and not a serious suggestion.
Log as in the structure.
Shortly followed by:

"Sockets are all you need for durable workflows" and then finally "Kernel primitives are all you need for durable workflows."

But seriously, part of being a professional is using the right tool for the job.

Pardon my ignorance trying to follow up on what is most likely sarcasm but is this not Kafka's claim to fame?

I am joining a new project and need to know to what extent Kafka is still a part of the future for new big data projects. It doesn't seem like there are alternatives at the high end but instead the question is when other technologies (that are easier to manage, require less compute, etc.) max out.

> Pardon my ignorance trying to follow up on what is most likely sarcasm but is this not Kafka's claim to fame?

Yes

> I am joining a new project and need to know to what extent Kafka is still a part of the future for new big data projects.

It's not gonna win or die on merit.

If I were to sit here and propose that you mutate bank accounts in-place (which I'm more or less doing by analogy right now over on https://news.ycombinator.com/item?id=48339103), and you just need safe enough locking technology to do so, I'd be immediately and rightfully shouted down that no one does that and you write down the money transfers and derive the final balances from that.

Manage any other kind of state, however, by appending state-changes and rolling them into a derived state, and then you're chasing fads and doing resume-driven development. So I'm skeptical about the future of this way of doing things.