Hacker News new | ask | show | jobs
by ahk 4708 days ago
This seems to only explain a single faulty recharge, due to the customer using the service with a zero balance and triggering the charge attempt. Why would there be multiple recharge attempts? Was the customer using the service in that period multiple times triggering each of the recharge attempts or was it the code re-trying the transaction? If it was the code why would it restart the transaction from the top instead of just the part that failed - the balance update?
3 comments

I could have explained this better. Many usage events on the Twilio API generate a billing transaction (e.g. call, SMS message, phone number purchase, etc). When the transaction attempted to apply against a balance of zero and the customer had auto-recharge enabled, the billing system would trigger a charge attempt. With balances set to zero and read-only, each subsequent usage event would trigger a charge attempt, resulting in the erroneous charges.
So, the users most affected were some of your most active/high traffic ones? Ouch.
While it's unlikely that Appointment Reminder is one of Twilio's largest accounts, we do have a number of customers in the Eastern Standard time zone, and due to customer usage patterns we have a predictable spike in outgoing calls and SMS messages which happened to coincide with the early-morning PST Twilio problem.

Under normal circumstances, I have Twilio set up to bill us $500 if the balance ever dips below $500. That $500 is our rebilling increment. Each SMS message and phone call we made caused us to get charged our rebilling increment. We hit $3,500 before our credit card company started rejecting charges. I think that if they hadn't, we would likely have saturated our credit line.

I'm thrilled with Twilio's response to this issue, and most other transient issues I've had being a Twilio customer. The system is mostly rock-solid reliable. I actually went to bed during the middle of this event (midnight, Japan time) because a) my systems were reporting that our messages were successfully going out (so no customer-visible downtime) and b) I had total confidence in Twilio to take care of things. And they did.

An incident like this is pretty painful for every customer affected. Hope they feel this explanation, the prompt refund of the erroneous charges and credit represents to them it is a pain we very much share.
> ".. Twilio usage that resulted in a billing transaction (e.g. 1 cent for a SMS message or a phone call) triggered the billing system to attempt a recharge using the credit card associated with the customer’s account. This only affected accounts with auto-recharge enabled."

> "Consequently, the billing system charged customer credit cards to increase account balances without being able to update the balances themselves. This root cause produced the billing incident of customer credit cards being charged repeatedly."

Sounds like when an action required funds from a user's balance (which their system thought was 0), it attempted to recharge their balance by charging their credit card. And since the system also could not write to the database (to increase the balance) the balance remained at 0. Thus, the system kept thinking it had to recharge the balance again.

I would assume from the article (I have no experience with Twilio) that there's some "purchase X more SMSes when I've fewer than Y SMSes left" feature. So whenever the customer uses their API to send an SMS (or make a call), the software would detect that there are fewer than a configured amount of messages left and would bill the customer to top off the account.

The actual billing would go through ok and the master database would likely get updated, but the frontend that's doing the sending won't see the updated balance, causing it to purchase more credit.