Hacker News new | ask | show | jobs
by lutzh 1097 days ago
Thanks for your feedback! Delayed e-mail can have of course have many other reasons than the service supposed to send it being down. But let's go with your example. Let's say you have a LoginService, and a MessagingService, and if you request a password reset, the LoginService sends a ResetRequested event. Now if the MessagingService is down, you'd still get told to check your inbox, although the e-mail hasn't even been sent. That's a bad user experience. Agreed. Your suggested solution is to make the communication between LoginService and MessagingService not event driven. So if the MessagingService is down, you can "hard error". But how much better is the experience then, really? As a user, you request the reset, and after a few seconds (you're already annoyed and checking if your internet connection is down), you get an error message that you can't reset your password right now. I'd argue that option is equally unacceptable. What you need to do, is to make the MessagingService better. In an event driven system, each participating service makes a promise. The MessagingService must give service level guarantees, it must be built and scaled to be able to process all the events it's listening to. If it has an error, not you should get the error message - you are the last person able to do something about it - but someone who fixes it. If you want a better user experience, don't build unreliable services and then apologize for the inconvenience. Built better (more reliable, more scalable) software. In the end, that's what event-driven architecture is about.
1 comments

> Delayed e-mail can have of course have many other reasons than the service supposed to send it being down.

Of course. But I was presuming that your email service makes an API call to an email service provider rather than speaking SMTP itself. That API call has three possible cases:

- Ok: You're done, your ESP is responsible for SMTP retries. Maybe there's a webhook to re-enqueue something eventually after failures.

- Transient error: You retry. But you probably don't want to wait that long and there's probably no point in interleaving attempts to send other emails through the same API. So you may as well "block" (as in yield to the async scheduler) while waiting to retry.

- Persistent error: Here's when I suggest hard failing up to the user.

> If it has an error, not you [the end user] should get the error message - you are the last person able to do something about it - but someone who fixes it.

In the post you replied to I walked through in detail how my experience would be noticably better if I got an error message.

> If you want a better user experience, don't build unreliable services and then apologize for the inconvenience. Built better (more reliable, more scalable) software.

I agree you should attempt to eliminate all failability. I disagree that if you think your software is perfect you don't need to write code that provides the least unpleasant user experience possible if an error happens. Email is an exceptional example: it is theoretically impossible to write a service that infallibly delivers email to {gmail,yahoo,outlook} within a short period of time.