Hacker News new | ask | show | jobs
by steakknife 2918 days ago
Interesting writeup. This is also a major issue for us at TradeIt (we do something similar but for stock brokers and portfolio/trading) as the brokers we integrate are not always...ahem..."robust". We've found that our upstream users really appreciate that often we can tell them about brokers' service outages before the brokers even announce it (when the brokers even bother). Sometimes the brokers don't even realize their system is malfunctioning until we poke them to ask what's going on.

Our throughput numbers are much lower and and our integrations are much fewer than Plaid, so we have been able to get away with keeping a close eye on Graphite/Grafana for spikes in request failures/timeouts. Seems like eventually we will need to implement some kind of statistical monitoring and alerting.

1 comments

grafana has that ability built in!