Hacker News new | ask | show | jobs
by Supermancho 1805 days ago
> SQS is one of the best products that AWS offers.

I would never trust anyone who says this, regardless of who did what with it.

SQS Standard (which was the only for a long time) has historically been a nightmare for almost any use case including job queueing. I love creating a whole separate cache to ensure when SQS delivers message multiple times, that I don't act on them. That's just ONE issue that I don't need to have. When you get into FIFO to avoid that, now your costing is ridiculous at scale AND the (relatively small) number of issues with FIFO that are still outstanding .

eg https://tomgregory.com/3-surprising-facts-about-aws-sqs-fifo...

6 comments

It's really not hard to pick a natural key to use for idempotency in your persistence layer. If you don't want at least once delivery, why are you picking SQS instead of some random ephemeral queue like NATS?

I second the parent post. I've billions of messages through SQS at a previous job and I can remember having issues with SQS availability exactly once (due to a systemic failure involving DyanmoDB where cascading failures took down almost everything (and all of our integrations) that would have hosed us completely if not for our durable-queue/s3-journal[0] at the edge). SQS is simple to use, does what it says, and has very good SDK support. A++++ Would build a business on top of again!

[0] https://github.com/Factual/s3-journal

This is a surprising take. Accounting for duplicate events is such a common pattern in distributed systems that it never occurred to me to think of it as an issue. What is your preferred approach?
I don't really understand this point.

I've found its trivial to just have a unique ID in the message and just check if the ID already exists in a jobs table to avoid double processing.

I pretty much always have some metadata table needed for the job anyway, so its not like I'm building something extra.

jobs table? why not just use the database directly then?
Use the database directly? I hope you're not suggesting sharing a database between services...
Sharing an SQS queue between services is ok then? Even the most basic example for SQS queues from the AWS SA training course recommends using multiple queues for different classes of video transcoding. All of those benefits evaporate once you share a single queue; if you're just reading that queue to create a job in a database, then the queue provides no benefit.
I'm not quite clear what you mean by "sharing a single queue between services". If you mean having multiple different services reading from the same queue then no, I wouldn't do that. If you mean having a sending service and a receiving service share the same queue (either directly or indirectly) in the sense that the sender sends a message and the receiver reads it then I can't see any alternative to doing that. If I've misunderstood then would you be able to share what you were thinking of?

I interpret arduinomancer's post above to mean that if the receiving service must action a single message once and only once (which is sometimes but not always necessary) then you need to give each SQS message a unique ID and that ID needs to be stored in as DB by the receiving service. I've used that pattern a few times and it works well (and doesn't result in unnecessary coupling between sender and receiver). In the past, I've used messaging systems that gave stronger transactional guarantees, but those systems were things like JMS or MQSeries and I don't really want to go back to those days.

> I'm not quite clear what you mean by "sharing a single queue between services".

I was referring to your (seemingly snarky, and I apologize if I mistook your intent) comment that you hoped I wasn't suggesting to a shared database between services.

I understand and agree that this is what OP was stating. What I was saying was that having the consumers and producers sharing a queue is not that much different from having them share a database, but a database gives you other guarantees as well. (I am a big fan of queues and have been using SQS since 2007.)

In other words, the database is still going to be your bottleneck/SPOF and a queue just introduces needless complexity; you might as well just write to the DB directly and have a table in the DB (or something like a Redis set) to track the jobs.

A queue with indeterminate ordering is better aimed at idempotent jobs than jobs that will eventually need to be serialized by necessity or design.

The one legitimate "problem" in that list that is #3, the hard limit on FIFO messages (20,000). The rest are expected behavior for a 'deliver exactly once and in order' queue. The use cases offered (client fails to process and therefore consume the next message, blocking subsequent messages) and somehow (?) expecting the queue to facilitate reprocessing previous consumed messages are both misuses of such a queue.

SQS (standard and FIFO) has another problem that precludes using it in one case I have; the 256KB message size limit. Amazon has a Java-only workaround (Amazon SQS Extended Client Library for Java) that will spool large messages to/from S3 -- so clearly my case isn't unreasonable -- but there is nothing for the general case.

SQS FIFO is like 25-40% more expensive, but you're getting added benefit and additional guarantees in exchange.

For the most expensive tier: 40 cents per million API requests for plain SQS vs 50 cents per million API requests for FIFO SQS.

Like many AWS Services, it started off pretty basic or limited, but gradually and incrementally improved over time.

I started off pretty skeptical, but over time I've grown to be impressed with SQS and SQS FIFO.

I’m no longer with Airbnb / the project, but we did in fact use SQS Standard exclusively for job queueing (I don’t think we ever did explicitly add any support for FIFO). Our SLA was at-least-once (as most job queues are), so this didn’t matter as much. In practice it also doesn’t happen that often.