Hacker News new | ask | show | jobs
by logisticseh 1399 days ago
Many of the message queues I have seen are only necessary because of personalization or analytics features that the user doesn't want and the business probably doesn't need.

When I try to estimate ROI on those features it's usually miniscule. A/B test are unconvincing and the aggregate reports synthesized from the data look more like numerology than rigorous science. More-over, senior leadership usually doesn't realize how much additional liability they take on by collecting and storing so much customer data.

It's not 2010. The large tech firms are your direct competitors and regulators across the world have caught up, including in many US states. If you have a primary product for which people will pay money, then the "surveillance agency as a side-business" model of the 2000s-2010s is probably a bad idea. It's a net expense that exposes you to liability and distracts your team.

Additional benefit: without real-time analytics and personalization, you can go even further without a message queue.

1 comments

> because of personalization or analytics features

That's not the architectural reason for message queues in my experience.

Primarily, a message queue is used when there is a potential for a bottleneck in the overall application throughput where there is high cost of reacting to some event so instead of reacting to the event synchronously, you queue the event and react to it asynchronously.

Some very common use cases:

1. Your UI. Operating systems use a message queue to capture and forward inputs. Your mouse and keyboard events are all queued in a message queue.

2. Webhooks. Many webhook origins have timeouts on responses and your application must respond within a given window or it will be throttled or downgraded. In this case, the best practice is to queue the event to be processed asynchronously (that queue can be a simple database table with your own logic and wrapper around it or something like AWS SQS, Azure Service Bus, or Google Pub/Sub).

3. Mitigating Throughput Bottlenecks. Most applications have a read/write asymmetry so it makes sense to optimize your architecture to scale for reads. But what if your application occasionally has to handle a burst of writes? Should you size your infrastructure for that case? One approach is to proxy the writes through a queue so you can size the infrastructure for a maximum throughput that is managed by the queue. For example, instead of 1000 concurrent writes per second, a queue can capture the write mutations and trickle out only 100 concurrent writes per second. Instead of sizing your application to scale to handle 1000 writes per second, you only need to size your queue to handle that scale.

4. Resiliency. If a message fails, it can be retried according to whatever heuristics make sense for the domain. Sure, you can use a simply loop to retry, but every message queue provides some mechanism for handling retries, failed message delivery, and so on. If you decide to roll your own and log a failed call into a database to try it again later...well, you've effectively captured a message in a custom queue.

User tracking and/or personalization tick 2-3 of your 4 boxes:

UI - Logging mouse movement, keyboard events, and other types of attention proxies.

Webhooks and/or Bottlenecks - calling out to either internal or third party classifiers, or more recently even generative models, for personalization based on user tracking data.

And I don't think this is just my unlucky experience. The fine article includes user tracking as one of only three explicitly enumerated reasons for queuing:

> Why? Because users increasingly expect a real-time experience. In use cases like order flows, webhooks, user tracking, etc. users expect to be able to see the new data in the user interface instantly, instead of having to wait for some background batch processing to periodically reload.

Of the explicitly enumerated motivations in the article:

1. "Order flows" - tracking/modification is often one of the higher latency items in order flows ("you might also like" / "what to order next" features).

2. "Webhooks" - often used for tracking/personalization

3. "User tracking" - ...this one is easy :)

Webhooks go far beyond personalization and tracking; it's a general purpose integration pattern.
Yes, and I've worked on browser-based games that are built on top of webhooks. But what percentage of people could forego webhooks entirely if they weren't doing any personalization or tracking? Or could at least get away with webhooks without any "real" queuing infra? I'd wager a large number.
Not really. Any payment system requires webhooks
Shameless (but relevant!) plug: we built Svix to help people send webhooks from their platform. With Svix you don't have to use message queues for webhooks, at least not from the sender side: https://www.svix.com/

Note: though message queues are great, they make asynchronous operations and interacting with external services much more resilient.