Hacker News new | ask | show | jobs
by iBotPeaches 1543 days ago
It seems like we haven't had a non-robot status update on the status page in days since this what seems like daily occurrence. I figure at this point we'd get something of why this is happening.

I also don't appreciate our builds freezing, unable to be cancelled and then eating up hundreds of minutes.

4 comments

Billing should always be built on a "ping" IMO and not start/stop hooks. The latter is shockingly bad for customers during times of unreliability. The former sounds stupid and requires more infrastructure from the one offering the service, but I think it's more fair.

I haven't used GA in a way where it actually costed me anything, but having minutes just tick away while you can't do anything is really stupid if that's the case.

Edit: Another sane solution would probably be to record outage periods and have Billing automatically reconcile for every customer when invoicing. This would require them to admit the outage durations however, so it may be flawed from a human perspective.

The "ping" solution is an interesting one that I haven't seen proposed before.

At what rate would you do these pings? I don't know how upgrading/downgrading works at GitHub but if they do any sort of refund/credit when you downgrade, it seems like there's some interesting implications for abusing the system (e.g. upgrading/downgrading between pings for "free" service if the time between them is too long) versus performance (e.g. how do you update all users per ping in a timely manner if the time between them is too short?).

Would love to read up more on this approach; seems interesting!

> I figure at this point we'd get something of why this is happening.

I've created a new discussion in their feedback repo asking for this, three major outages in a week could really do with a post-mortem: https://github.com/github/feedback/discussions/13344

I suggest you add the timeout-minute property on the job/step, so even if the web interface isn't responsive the job times out eventually. Saves you from spending time emailing support about consumed minutes.

Of course, assuming that a future bug won't affect the timeout-minute itself.

Do they give you the minutes back if there's an incident during the period where a job is running?
You will have to contact them for them to credit you, that's what we did
This is totally unsurprising and also totally unacceptable IMO. They should automatically wipe out all build minute usage during outages for every account if they insist on architecting their system in this way.