Hacker News new | ask | show | jobs
by treis 1392 days ago
This doesn't actually tell you how far you can go without a message queue. Unsurprising because they're trying to sell one. But you can go very far without them by using the DB or redis as your message queue. Probably fine for 99%+ of applications.

Don't add a message queue until you really need one. And if you do make sure you account for it being down, running locally, back pressure if the queue gets full, monitoring, and logging at least.

8 comments

Many of the message queues I have seen are only necessary because of personalization or analytics features that the user doesn't want and the business probably doesn't need.

When I try to estimate ROI on those features it's usually miniscule. A/B test are unconvincing and the aggregate reports synthesized from the data look more like numerology than rigorous science. More-over, senior leadership usually doesn't realize how much additional liability they take on by collecting and storing so much customer data.

It's not 2010. The large tech firms are your direct competitors and regulators across the world have caught up, including in many US states. If you have a primary product for which people will pay money, then the "surveillance agency as a side-business" model of the 2000s-2010s is probably a bad idea. It's a net expense that exposes you to liability and distracts your team.

Additional benefit: without real-time analytics and personalization, you can go even further without a message queue.

> because of personalization or analytics features

That's not the architectural reason for message queues in my experience.

Primarily, a message queue is used when there is a potential for a bottleneck in the overall application throughput where there is high cost of reacting to some event so instead of reacting to the event synchronously, you queue the event and react to it asynchronously.

Some very common use cases:

1. Your UI. Operating systems use a message queue to capture and forward inputs. Your mouse and keyboard events are all queued in a message queue.

2. Webhooks. Many webhook origins have timeouts on responses and your application must respond within a given window or it will be throttled or downgraded. In this case, the best practice is to queue the event to be processed asynchronously (that queue can be a simple database table with your own logic and wrapper around it or something like AWS SQS, Azure Service Bus, or Google Pub/Sub).

3. Mitigating Throughput Bottlenecks. Most applications have a read/write asymmetry so it makes sense to optimize your architecture to scale for reads. But what if your application occasionally has to handle a burst of writes? Should you size your infrastructure for that case? One approach is to proxy the writes through a queue so you can size the infrastructure for a maximum throughput that is managed by the queue. For example, instead of 1000 concurrent writes per second, a queue can capture the write mutations and trickle out only 100 concurrent writes per second. Instead of sizing your application to scale to handle 1000 writes per second, you only need to size your queue to handle that scale.

4. Resiliency. If a message fails, it can be retried according to whatever heuristics make sense for the domain. Sure, you can use a simply loop to retry, but every message queue provides some mechanism for handling retries, failed message delivery, and so on. If you decide to roll your own and log a failed call into a database to try it again later...well, you've effectively captured a message in a custom queue.

User tracking and/or personalization tick 2-3 of your 4 boxes:

UI - Logging mouse movement, keyboard events, and other types of attention proxies.

Webhooks and/or Bottlenecks - calling out to either internal or third party classifiers, or more recently even generative models, for personalization based on user tracking data.

And I don't think this is just my unlucky experience. The fine article includes user tracking as one of only three explicitly enumerated reasons for queuing:

> Why? Because users increasingly expect a real-time experience. In use cases like order flows, webhooks, user tracking, etc. users expect to be able to see the new data in the user interface instantly, instead of having to wait for some background batch processing to periodically reload.

Of the explicitly enumerated motivations in the article:

1. "Order flows" - tracking/modification is often one of the higher latency items in order flows ("you might also like" / "what to order next" features).

2. "Webhooks" - often used for tracking/personalization

3. "User tracking" - ...this one is easy :)

Webhooks go far beyond personalization and tracking; it's a general purpose integration pattern.
Yes, and I've worked on browser-based games that are built on top of webhooks. But what percentage of people could forego webhooks entirely if they weren't doing any personalization or tracking? Or could at least get away with webhooks without any "real" queuing infra? I'd wager a large number.
Not really. Any payment system requires webhooks
Shameless (but relevant!) plug: we built Svix to help people send webhooks from their platform. With Svix you don't have to use message queues for webhooks, at least not from the sender side: https://www.svix.com/

Note: though message queues are great, they make asynchronous operations and interacting with external services much more resilient.

"But you can go very far without them by using the DB or redis as your message queue."

So, you can go very far without a message queue... by using a simpler message queue?

I mean, at some point you have to consider something a "proper" message queue. Otherwise, at the lower extreme your web server would act as a message queue.
"Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache, and message broker"

Lets draw the line somewhere the other side of applications that advertise themselves as having that function?

The number of people that think that Redis is a KV store and nothing else is too damn high.
Redis is not a database. it doesn’t have proper transaction support.

it’s basically a cache with a fancy API so you can incrementally update data structure instead of having to constantly write the whole thing you are trying to cache.

Sure it is! Redis is even fully ACID if you use the right persistence settings complete with transactions support and a WAL. You can even set up locking to abort/rollback transactions. In DB parlance Redis’ isolation level is strict serializability which is stronger than Postgres.

If you mean “it doesn’t have every feature of Postgres” and behaves a little differently then sure but it absolutely is a database.

Actually a more complicated message queue, because it's on you to implement all the queueing logic. Then back it by a potentially non-durable data store (its very tricky to get redis durable and ha) with no monitoring?

All of this to avoid spinning up an SQS queue?

Isn't that a bit of an odd wording though. Of course if you only buy the horse without the cart you have to build the cart, making "your" solution more complicated. But you still bought the "simpler" solution i.e. the horse without a cart.
Complexity may be the wrong word, but what I mean is surface area for bugs you are responsible for.

I also question whether you really bought a horse in this analogy.

By using something you're already using.
And all the transactional challenges that having more than one persistent store brings with it. Either need to have XA orchestration or implement the saga pattern. Either way it's a massive complexity overhead if transactional consistency between the queue and your database is important for your application.
it’s usually done using the outbox pattern.

i’m not sure if this pattern can be called a type of saga.

you basically have a tiny queue in you db using a special table that you drain to an external queue server.

You probably dont need a message queue if you have redis. And quite a lot of code surrounding it. Which also makes it a message queue. Example: [0]

What you might mean is that you might not need a complicated server setup e.g. Kafka for simple message queues?

[0]:https://github.com/coleifer/huey/

Azure Event hub and Kafka .. are not message queue.

A real queue have an API similar to JMS, ex rabbitMQ, IBM MQ, Microsoft MQ, …

The difference is you have a mailbox of message from which you remove messages you have consumed by acknowledging reception. Message will be sent to subscribers until they are acknowledged.

So publisher write message m1, m2, m3

subscriber receive m1, m2,m3 and ack only m3.

it will later receive m1 and m2 again but not m3.

Kafka is more like tcp It’s a stream.

You probably don’t need a separate message queue service if you have a transactional (probably “real SQL”) database. The “table as queue” pattern in SQL has been used since the 1970s at massive scale and was taught in my CSE 100-level classes back in the 1990s. The natural API for such a thing is basically the same as Rabbit/Kafka//SQS/whatever.
The ”massive scale” of the 1970s is very far from today’s ”ordinary scale”! Just saying that this approach really does not scale well and problems will start around high hundreds of thousands to low millions of messages queued.
i agree with you if you have a transactional SQL it’s usually best to stick to this until it doesn’t scale.

it allows you to make the « add message to queue » part of a transaction that update other table which is nice!

do you have any pointers? Just curious how this is taught in the old days
Oh boy. ”Roll your own queue” is really bad general advice. Lots of unknown unknowns on the way. And what if it’s for data integration into/out of said database?
Right now we're using implicit message queue in database using statuses. We have entities with statuses and cron jobs which run often, do some work and change statuses. It's more like state machine. But we plan to migrate this system to kafka, because every status change implies delay between cron jobs and that makes system somewhat slow.
This works fine until the table gets big.

Then most of your statuses are Complete and finding an ever decreasing percentage of Incomplete rows will result in essentially a table-scan.

Then that table scan will start to take longer than your cron interval or will biff through your IO or CPU and you've got problems.

Then you might try adding indexes or beefing up your queries and then you'll realize you've just traded 6 for 12/2 because now your index updates on writes are hurting you more than your scans on reads.

Then you'll have the brilliant idea to cull your data regularly or have different tables for Complete vs Incomplete.

And then you'll want to do this culling or row-move in a transaction because you've never bothered making the job idempotent because it was implicitly so in the database all along, and now your transaction locks will compete with your cron scanning or index updates and now you'll maybe put the status table(s) in a different database and work on idempotency.

And then maybe you'll take a step back, have a nap, and wake up to realize you've just implemented a queue but not very well and have tied its design to your specific business-flow and good luck adding retry logic or new status fields.

If you're gonna use a database as a queue, make it a different database server (or schema!) from your business data. Don't tie business transactions (completing the job) to database transactions (since ultimately there will almost always be more than just one data/processing system involved), and maybe really just adding a simple Redis or SQS thing from the beginning isn't going to cost you that much so might as well squeak it in while the costs are still low.

> This works fine until the table gets big.

> Then most of your statuses are Complete and finding an ever decreasing percentage of Incomplete rows will result in essentially a table-scan

This is absolutely not true. Performance of your lookups will depend on the total number of "pending" items and never on the table size.

Generally indexes on boolean columns are useless due to low cardinality, but in this specific example they'll work just fine. If you have a 100 "pending" items in a table with billions of rows, search on "pending" status will never trigger a fullscan, only a 100 lookups. You can even make your index partial, so essentially your index will be your mini-queue.

Idempotency is a valid point, but it is as hard with queues as it is with databases. Exactly-once delivery is generally not possible in distributed systems (in the real world at least, where you don't own every node). Queues won't magically solve it for you, they need to implement the same locking mechanisms as the database.

Partially this is why AWS SQS boasts to be almost infinitely scaleable, but limits you at 300 (!) rps for FIFO queues.

I love explaining how exactly-once isn’t achievable. It’s my favorite topic… it also can be annoying when you have two systems with an at-most-once vs at-least-once mismatch.
Background jobs creating other jobs which create other jobs and so on is a tough situation to be in. Big potential for race conditions and other bugs hiding in that complexity. One of those things where either you've made really good decisions along the way or really bad ones.
that was exactly the point I was trying to make! One of the things we aim to do with chiselstrike is to allow that transition to happen easily by providing a unifying interface.
I can't believe you (or anybody else) was able to read it. It was so painfully terribly written I bailed after the second section.
What are techniques that I could use to improve my writing ? Always eager to learn.
I did have a bit of trouble reading it at first. But because I was determined to, I got through. Unfortunately after getting through, it made more sense than it did initially. :) So I'm struggling to think of useful feedback.

If this were just a discussion about backend architectures or message queue architectures or Chiselstrike it might feel more cohesive.

I think it's challenging to describe simply, because you've built a system that masks over these different architectures.

Super minor things: 1) personally I find memes in technical articles distracting but I know not everyone agrees. And 2) the differing graphics backgrounds and borders on the gray background, coupled with Medium's (two) sidebars is also a little distracting.

I have a hard time pinning down why it's so hard to read. I think it's because some of your sentences aren't actually full sentences.

Examples:

"But often, especially with nascent projects, we can’t afford."

"From full backend-as-a-service solutions like Firebase, to tools like ORMs to abstract and simplify dealing with databases."

"Simple, with an architecture that already internalizes all of the common issues seen when handling events."

You also sometime use sentences that are too verbose or hard to parse.

Example:

A) "In this post we’ll show how the lack of good abstractions leads to build-your-own backends lacking a key component that is present in most modern backends: a message queue."

B) "In this post we’ll explain why message queues are important to a good back-end, and why developers may overlook them. (part that doesn't need to be said: because of the focus on database-centric abstractions.)"

A) "If you ask most people, a backend is comprised of a database where the data can be stored, a business logic layer to mediate access to that data, and a couple of authentication components. But if you ask modern backend engineers, the answer differs. Yes, you have all that, but there’s a major component missing in this picture: a message queue.

From modern alternatives like Redpanda, to established systems like Kafka, message queues are an integral part of most professional backends in use today."

B) "Most modern back-end engineers will attest that a message queue, such as Kafka or RedPanda, is crucial to a good back-end."

Generally I'd get rid of the conversational, quirky tone, I think it'll improve things stylistically.

For the content itself I'd write a clearer plan with titles and subtitles, and the idea you want to convey in each paragraph.

Also a good tip is to always write the introduction at the end, it makes it clearer in your mind what you're actually trying to convey.

I skimmed it (and am familiar with the tech background). Minus the memes (not a fan) it seems fine.

Related possibly helpful, I get the feeling these days we write for 2 distinct set of eyes on the internet: those looking at a large screen and those who digest articles on a 3x4 inch screen. For example, one can see your diagrams and read the text discussing them on a pc at the same time.

Hey, did anyone tell you that being a rude prick on the internet is an irritating hobby?

It's like you showed up just to dump on the author. You added no value here.

You fail to articulate why it was hard for you to read. You failed to articulate what might have been better. You failed to have a basic level of compassion and human decency in a way where you could even make a base-level attempt at wording your post as anything other than a snarky attack.

I'm personally sick and exhausted of commenters who behave like this. The people who contribute nothing but put in the extra effort to show up in comment sections to dump on people who are trying their best to share or build something.

> Hey, did anyone tell you that being a rude prick on the internet is an irritating hobby?

Factually, you are the party introducing personal insults into this conversation.

Hey! OP here.

I didn't know I was trying to sell a message queue, thanks for letting me know!

It's because you're the CEO of ChiselStrike.

I don't get it but there's a good chunk of people who are ready to jump on anything that looks even slightly comercial by even one single aspect. I've been told my personal blog was AI generated blog spam once in the past when I had literally nothing to sell. I also don't understand the fundamental issue with having a commercial involvement as long as you aren't trying to BS people or shove advertising down their throats.

I am! but Chiselstrike doesn't sell a message queue, so still don't understand how I'm selling one.

It's an abstraction over a message queue, which is only one feature among many.

But btw, as a pointy-haired dude I am becoming (although I am bald), if someone really wants to buy a message queue from me, I'll happily sell it!

In order to sell an "abstraction over a message queue" you have to have a message queue under that abstraction, right?

I don't think there is anything wrong with your article, you are quite up front about what you are selling, so personally it is no big deal.

I mean, they're selling something. According to the TFA, it's an abstraction that may or may not use a backend message queue, but your app shouldn't care. So they're not selling a message queue, but they're selling something.
Sure am, but chiselstrike is OSS as well. So no different than any OSS project with a monetization strategy. (github.com/chiselstrike/chiselstrike)
But what's the issue with making a post and discussing how a product one made solves an issue? GPT-3 and DALL-E are both products with a pay-per-play model. So is AWS and S3. We talk about those things all the time with no scrutiny or suspicion.
It's hard not to open https://www.chiselstrike.com/ and notice that you sell something and then assume your blog post is part of that endeavor.

My personal opinion is that if you don't want to come off as selling something because your genuinely interested in tech, just publish under a different domain or pseudoname

this is a bad take. people work on problems they find interesting. it follows that you write about them because you can. the author has worked on kernel hypervisors, databases and now doing stuff around code generation. btw their code is apache 2 and on github, what they sell is the hosted version.