Hacker News new | ask | show | jobs
by StabbyCutyou 3478 days ago
I didn't discuss queues because queues aren't intrinsically a part of a MS architecture.

You certainly can do it that way, and there are lots of benefits, but queues (and if you're doing this at scale, distributed queues) are an entirely distinct subject matter that I chose not to touch on for this.

Also, things can still get lost with queues. Network failures, partitions, bugs in your own publishers and consumers... nothing is fool-proof, and it takes a lot of work to reduce that loss rate to 0.

1 comments

> I didn't discuss queues because queues aren't intrinsically a part of a MS architecture.

I find this an odd juxtaposition. On one hand I see what you are saying, but on the other I believe the nature of MS architecture places huge emphasis on State and State management that isn't necessarily as large a focus in Monolithic patterns. Stateless versus Stateful and now the creative application of "Soft"-state design patterns become non-trivial concerns in a MS world.

To that end, I feel like some address (positive or negative) on the impacts MS architecture have on state and the pros-and-cons I think are topical.

I was also wondering, for my own education, if you could expound a little on your points here:

> Also, things can still get lost with queues. Network failures, partitions, bugs in your own publishers and consumers

Specifically:

> Network failures

I had always thought that the delivery guarantees of TCP (and maybe the new QUIC protocol over UDP) were robust against network failures. What kind of edge cases would result in data loss in queue context?

> partitions

I am entirely ignorant on how partitions might result in data loss...so what kind of things have you seen?

> bugs in your own publishers and consumers

Isn't this true across any application regardless of architecture? Or are you implying that MS architecture/Queues some how increase the likelihood of bugs by nature?

Thanks in advance! (I always enjoy learning new things and hearing about/learning from other's experiences)

So, specifically in the context of a distributed queue (which, if we're talking about using MS arch + queueing due to scale concerns, you really need some kind of distribution to the message queue imo), these things get a lot harder.

Just because TCP provides resilience doesn't mean that you're perfectly defended against all kinds of issues here.

I'm mostly talking about things in the category of bugs in the software layer using TCP (the drivers, the consumers/publishers, race conditions, batching messages, not receiving ACKs, etc). There are a lot of little things that can go wrong.

In terms of partitions, look up a series by a guy named Aphyr called "Jepsen". It goes over the CAP theorem as it applies to distributed datastores and queues. His examples and tests will demonstrate the concepts behind partitioning better than I can explain over a HN comment :)

And yes, these types of failures are implicit everywhere - but every additional layer you add, every hop in the chain, every interaction added to the request flow increases the surface area for problems. Especially once you push high scale with hundreds of nodes, become nic or cpu bound, etc etc.

There is a lot to unpack here, and it's not as simple as it seems on the face of it.