Hacker News new | ask | show | jobs
by throwawaythekey 1328 days ago
It sounds like you've identified the issue yourself. You are relying on ordering when processing events. You need to either loosen that requirement or do better testing to prevent head of line blocking.

I don't know much about your application but the fact that you can mitigate the problem by scaling the number of workers suggests that the order requirements might actually be fairly weak. As a worst case outcome you may be able to push all events interdependent to the one with an error to a DLQ using a temporary blacklisting mechanism, but by that stage I think I would just prefer better testing.