Hacker News new | ask | show | jobs
by jkat 4866 days ago
Depending on what side of Hanlon's razor you fall, the only conclusion I get from this is that they are either incompetent or dishonest. I have a very hard time believing that this issue remained unknown to them for years.

As for the post, it's pretty much just documentation. I didn't see any apology. And the only promise of a better tomorrow is a vague "Working to better support concurrent-request Rails apps on Cedar".

7 comments

I also didn't see any mention of refunds for all of the extra dynos that were needed due to the degrading performance of their service - or all the extra support hours where they told everyone 'not our problem!'.
I wish I could vote this up many more times. It's exactly what I want to find out about.
What SLA did they explicitly fail to deliver on such that they should offer a rebate?
They apologized in the last post. Also, self-critical language like "fallen short of [our] promise" and "we failed to..." is a de facto apology and acceptance of responsibility even when the word 'sorry' only appeared earlier.

I can understand how this developed. Things worked well for most customers. Many of those with problems got them under control with more dynos or multi-worker setups. Heroku's Rails roots biased them towards a "keep it simple, throw hardware at it, or look for optimizations in the app/sql/db" mindset. Well, many of their Rails/Bamboo customers complaining about latency, even in the presence of this growing issue, may have also (or even primarily) had other app issues too. (When supporting developers, especially many beginning/free-plan developers, it doesn't take long for your conditional probability P((we have a real problem)|(customer thinks we have a problem)) to go very low, and P((customer app has a problem)|(customer thinks we have a problem)) to go very high.)

Even when Heroku had a unitary (and thus 'smart') router, they surely got latency complaints that were completely due to customer app issues or under-provisioning, so they stuck with the 'optimize app or throw dynos at it' recommendation for too long. And, when they habitually threw more hardware at the Bamboo routing mesh, they were unwittingly making the pile-up issues for Bamboo web dynos worse. Some key data about the uneven pre-accept queueing at dynos was missing, which combined with habits of thought that had worked so far gave them a blind spot.

Despite the growing issue, adding dynos at the margin would still always help (at least a little) — as well as adding to Heroku revenues. Even without any nefarious intent, a 'problem' that fits neatly into your self-conception ("we give people the dyno knob to handle any scaling issues and it works"), and is also correlated with rising business, may not be recognized promptly. That's just a natural human biased-perception issue, not incompetence or dishonesty.

In short, Heroku needs to hire someone with some operations research experience. This is a mathematical modelling problem, not really a code problem.

Break out Mathematica, Matlab or R and model the damn problem. Then go research the solutions already available (Hint: look at many grocery stores, queuing problems).

I think apologies are over-demanded by our somewhat hysterical media that likes nothing better than to enhumble/humiliate a public figure (because it sells papers); and this flows through into expectations of private and corporate behaviour. But I've never had much use for apologies from other people. Years of abuse make "sorry" an entirely debased term in my lexicon. I've seen statements of regret that omit the word and are all the more sincere for it.

Much more useful than an apology is an acceptance of fault (which is not the same thing); an expression of desire to improve, and a sincere and demonstrable commitment to doing so.

[NB: don't mean to imply that Heroku have necessarily achieved all of that here]

I didn't see any apology. And the only promise of a better tomorrow is a vague "Working to better support concurrent-request Rails apps on Cedar".

-------

They apologized in another earlier post. https://blog.heroku.com/archives/2013/2/15/bamboo_routing_pe...

"We failed to explain how our product works. We failed to help our customers scale. We failed our community at large. I want to personally apologize, and commit to resolving this issue"

> ... they are either incompetent or dishonest ...

Exactly. If they didn't know, they should have. If they did know, well, ...

I agree fully with either they are incompetent or dishonest. I hope this response gets more press because Heroku better be beyond perfect from this point on. There is no excuse for this.
Your razor has a false-dilemma. They may be very competent, but having no intentions of caring for non-concurrent applications. Either because they did not think about the scenario or because the way RoR operates is silly.