| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by yzmtf2008 1805 days ago
	IMO, SQS is one of the best products that AWS offers. Airbnb built a queueing + scheduling system on SQS to great success [0] (disclaimer, I wrote this post). There are many others, e.g., Slack, that are building on top of SQS. [0]: https://medium.com/airbnb-engineering/dynein-building-a-dist...

2 comments

Supermancho 1805 days ago

> SQS is one of the best products that AWS offers.

I would never trust anyone who says this, regardless of who did what with it.

SQS Standard (which was the only for a long time) has historically been a nightmare for almost any use case including job queueing. I love creating a whole separate cache to ensure when SQS delivers message multiple times, that I don't act on them. That's just ONE issue that I don't need to have. When you get into FIFO to avoid that, now your costing is ridiculous at scale AND the (relatively small) number of issues with FIFO that are still outstanding .

eg https://tomgregory.com/3-surprising-facts-about-aws-sqs-fifo...

link

emidln 1805 days ago

It's really not hard to pick a natural key to use for idempotency in your persistence layer. If you don't want at least once delivery, why are you picking SQS instead of some random ephemeral queue like NATS?

I second the parent post. I've billions of messages through SQS at a previous job and I can remember having issues with SQS availability exactly once (due to a systemic failure involving DyanmoDB where cascading failures took down almost everything (and all of our integrations) that would have hosed us completely if not for our durable-queue/s3-journal[0] at the edge). SQS is simple to use, does what it says, and has very good SDK support. A++++ Would build a business on top of again!

[0] https://github.com/Factual/s3-journal

link

time0ut 1805 days ago

This is a surprising take. Accounting for duplicate events is such a common pattern in distributed systems that it never occurred to me to think of it as an issue. What is your preferred approach?

link

arduinomancer 1805 days ago

I don't really understand this point.

I've found its trivial to just have a unique ID in the message and just check if the ID already exists in a jobs table to avoid double processing.

I pretty much always have some metadata table needed for the job anyway, so its not like I'm building something extra.

link

gunapologist99 1805 days ago

jobs table? why not just use the database directly then?

link

chris_j 1805 days ago

Use the database directly? I hope you're not suggesting sharing a database between services...

link

gunapologist99 1805 days ago

Sharing an SQS queue between services is ok then? Even the most basic example for SQS queues from the AWS SA training course recommends using multiple queues for different classes of video transcoding. All of those benefits evaporate once you share a single queue; if you're just reading that queue to create a job in a database, then the queue provides no benefit.

link

chris_j 1805 days ago

I'm not quite clear what you mean by "sharing a single queue between services". If you mean having multiple different services reading from the same queue then no, I wouldn't do that. If you mean having a sending service and a receiving service share the same queue (either directly or indirectly) in the sense that the sender sends a message and the receiver reads it then I can't see any alternative to doing that. If I've misunderstood then would you be able to share what you were thinking of?

I interpret arduinomancer's post above to mean that if the receiving service must action a single message once and only once (which is sometimes but not always necessary) then you need to give each SQS message a unique ID and that ID needs to be stored in as DB by the receiving service. I've used that pattern a few times and it works well (and doesn't result in unnecessary coupling between sender and receiver). In the past, I've used messaging systems that gave stronger transactional guarantees, but those systems were things like JMS or MQSeries and I don't really want to go back to those days.

link

topspin 1805 days ago

The one legitimate "problem" in that list that is #3, the hard limit on FIFO messages (20,000). The rest are expected behavior for a 'deliver exactly once and in order' queue. The use cases offered (client fails to process and therefore consume the next message, blocking subsequent messages) and somehow (?) expecting the queue to facilitate reprocessing previous consumed messages are both misuses of such a queue.

SQS (standard and FIFO) has another problem that precludes using it in one case I have; the 256KB message size limit. Amazon has a Java-only workaround (Amazon SQS Extended Client Library for Java) that will spool large messages to/from S3 -- so clearly my case isn't unreasonable -- but there is nothing for the general case.

link

heartofgold 1805 days ago

SQS FIFO is like 25-40% more expensive, but you're getting added benefit and additional guarantees in exchange.

For the most expensive tier: 40 cents per million API requests for plain SQS vs 50 cents per million API requests for FIFO SQS.

Like many AWS Services, it started off pretty basic or limited, but gradually and incrementally improved over time.

I started off pretty skeptical, but over time I've grown to be impressed with SQS and SQS FIFO.

link

yzmtf2008 1805 days ago

I’m no longer with Airbnb / the project, but we did in fact use SQS Standard exclusively for job queueing (I don’t think we ever did explicitly add any support for FIFO). Our SLA was at-least-once (as most job queues are), so this didn’t matter as much. In practice it also doesn’t happen that often.

link

lloydatkinson 1803 days ago

You mentioned the tick rate of the scheduler? What is it? Every second, every minute, etc?

link