Hacker News new | ask | show | jobs
by dnsmichi 1078 days ago
GitLab team member here. Thanks for asking.

Incidents can have different types, i.e. when an application bug or performance regression is discovered, this can involve reverting MRs and rolling back releases. The Platform, Delivery group has a top-level responsibility for ensuring continuous delivery of the GitLab application software to GitLab SaaS, https://about.gitlab.com/handbook/engineering/infrastructure...

Other incidents may involve hardware or infrastructure failures, or a combination of both, infrastructure failure that renders GitLab application services unavailable. This requires cross-functional collaboration from infrastructure, product, engineering, etc. teams in the incident.

To get a better understanding here, it is helpful to review the incident management handbook https://about.gitlab.com/handbook/engineering/infrastructure...

Additional helpful information:

- The GitLab.com SaaS production architecture is documented in https://about.gitlab.com/handbook/engineering/infrastructure...

- The Monitoring of GitLab.com handbook provides insights into monitoring workflows, incident management, SLAs, etc. https://about.gitlab.com/handbook/engineering/monitoring/

- Runbooks https://about.gitlab.com/handbook/engineering/infrastructure...

For the current incident discussed in this HN thread, the review issue can be followed in https://gitlab.com/gitlab-com/gl-infra/production/-/issues/1... to learn more.