Hacker News new | ask | show | jobs
by bgentry 5397 days ago
Out of curiosity, what makes you think our data services aren't reliable or scalable?
3 comments

I'll jump in here, since you asked. Bad timing. About 2 hours ago we started getting these completely abstract errors 15 minutes before we were due to present to some VC folks:

2011-09-15T18:34:33+00:00 heroku[router]: Error H12 (Request timeout) -> GET [redacted]-staging.heroku.com/ dyno=web.1 queue= wait= service=30000ms status=503 bytes=0

I have absolutely no indication what part of our stack is having trouble. After much freaking out, we spun up an entirely new app and demoed with the seed data. This weird sort of stuff has been happening to us at an alarming rate over the last 3 months or so. Not being able to deploy, having to put in a ticket, and waiting 24 hours for a someone in support to fix it is another example.

Please don't take this the wrong way: I love you guys as people. I pushed for Heroku adoption at our shop. I absolutely love the concept of Heroku. Until a few months I felt like you guys were doing it way better than anyone else. (Left EngineYard to migrate everything to Heroku.) But these past few months have been really scary. There is a growing consensus at the office that we'll end up migrating away from Heroku to a platform where we can actually understand what's going on and be responsible for what's going on under the hood. After I began using Heroku, I never thought I'd want to go back to that again.

EDIT: Provided a more clear example that is less obviously suspect of a timeout.

Did you try contacting support? Looking at that log line, I can tell you that it looks like your app took too long to respond to the request (service=30000ms) and our routing mesh timed it out (status=503, Error H12 (Request timeout).

If there's anything we can help with, definitely get in touch. Our support team is top notch.

You're right, the support team is very friendly and supportive. But my resolution times for recent issues have been 24-hour turnarounds, even when we were feeling some level of pain and labeled it "high". In this instance, which was definitely an "urgent", I just didn't have any confidence the issue would be resolved in time. Thankfully when we've had issues like this (mostly with development/staging environments) it's always easy enough to just spin up another app. But that doesn't make me very comfortable having my production stuff there.
Are you on the shared hosting DB plan? Are you using more than 1 dyno?

We are running 4 dynos, 2 workers, and the Ronin DB plan and we haven't had any problems other than a 10 minute downtime due to a bad deploy by Heroku.

Just eyeballing it we have about 40-50 apps using the shared DB. We don't have any dedicated DB stuff with Heroku, but we use varying dynos levels. (Most apps idle at 1, though.)
Probably that the cheap plans are on shared DBs and it's very expensive to use dedicated DBs and the dedicated boxes appear to be single instances of Postgres.

I'm not complaining, but what is the official Heroku approach to things like DB sharding, etc.

They actually have a very impressive dedicated database service: https://postgres.heroku.com/

It supports scaling vertically by throwing in more cache, faster CPU, etc. Once you outgrow that, it supports horizontal scaling by replicating to read-only slaves.

I'm aware as we are running on heroku here, but it gets expensive quick and your options are more limited than if you roll your own. Then again, that goes for the whole platform.
I don't see how that has anything to do with the statement that I was referring to: "[their database services] don't seem reliable or scalable."
Unfortunately there is no official approach. The problem of partitioning and inconsistency to get more scalability is still evolving quite a bit.

I think things could be better, but it's not crystal clear what the right trade-off is for many people

My experience has been that the shared dbs are more performant than Ronin.

I have had a couple instances get inexplicably slow and needed to relaunch.

We're also looking into off loading our database needs elsewhere.

If you have any ticket links or more information about your problems please let us know, inexplicably slow should never happen, especially with a dedicated db.