I get the impression that Gitlab focuses a little too much on releasing new features at a rapid pace with every release. Maybe they should spend more time on running their infrastructure reliably and getting their engineering practices up to speed. The recent events will raise a red flag with enterprises who might be potential Gitlab customers and that directly affects their bottomline.
Yeah, I run Gitlab-EE internally and am aware of gitlab.com being a test bed but I can definitely see how others less familiar with Gitlab aren't aware of that.
Taking a quick glance at their site I can't even find anything about it basically being the test bed. There used to be some lines last year that said something along the lines that gitlab.com was known to be unstable and they recommended important projects to be self hosted. Maybe it was in the docs, I forget.
They should probably emphasize that somewhere if that's still the route they're going, although I feel like I did read they're working on major stability/infrastructure upgrades to solve those issues.
I don't pay much attention to gitlab.com since I use my own ee.
The real gem (pun intended) of Gitlab is not Gitlab.com, but the open source self-hosted version.
Chances are I don't want them to improve their infrastructure to the point where it can be a Github 2.0, because that probably means the setup and ongoing maintenance requirements of the self-hosted version will become excessive to support a scale that nobody using it has.
I questioned their engineering competence during their last outage and the general responses I got were very dismissive. Gitlabs were able to spin the last disaster into a publicity stun with the live streaming but I feel like that was a fluke. Bro coding and "openness" will only garner you so much good will with paying customers who are more concerned with availability.
The unavailability of a code repository might be more of an inconvenience for a single developer but in an enterprise environment with teams of coders being able to quickly disseminate code changes can be critical. The unavailability of a source repository becomes a huge liability and a waste of man hours.
Which is why I have yet to see a company with a gitlab.com important repo - but Gitlab local installs? Aw yeah! (Gitlab EE is their main focus, IMHO - providing a public Github alternative seems to be an afterthought.)
Unfortunately, events like this are one of the reasons I never was able to go all in on Gitlab. When I first started trying it out the performance was not great (it is pretty good these days though) and they just seemed to have more of these smallish events. I realize Github also has problems but it feels like they are much less frequent. I don't have any data to back that up though.
Either way, went self hosted recently so now I only worry about my server haha.
For sure and that's one of the reasons I think they have a chance of at least getting to the same level as Github. I've actually worked at two fortune 500 companies that are using the self hosted Gitlab which to me says a lot, I just wish the hosted option was a bit more... stable.
> Not blaming Gitlab for bad practices or anything, i'm just curious.
On the contrary, the backup snafu was caused by a series of bad practices. If that's how backups are handled I wouldn't be surprised if the rest of the testing infrastructure has issues as well. Heck, I'd be surprised if it didn't!
Particularly because a solid testing infrastructure works in tandem with your backup processes by restoring recent backups.
Nothing tests new code better than running it on a production restore and nothing validates backups better than using them on a regular basis for testing.
Even with a staging server, things can pass testing but fail in production if the staging environment provides an imperfect simulation of the production environment - and that's almost inevitable.
For example, your staging environment servers should be connecting to a different database with a different password. If the password's right in the staging config but wrong (or missing) in the production config, things that work in staging can fail in production.
> things can pass testing but fail in production if the staging environment provides an imperfect simulation of the production environment
Your staging environment should match production, or it's not really staging at that point. It doesn't have to match it in _size_, just structure and process. Ignoring data loss, if you can't quickly switch staging to production it's not really staging. It's just a dorky test environment masquerading as a stage environment. It's also surprisingly not that difficult (the variation of difficulty depends on the type of data you're interacting with, and how isolated it needs to be) to "forward" a slice of real word traffic to your staging environment and monitor it for some duration of time.
>For example, your staging environment servers should be connecting to a different database with a different password.
Handled by proper CI/CD pipelines. Completely irrelevant to deploying new features, configuration for production specific users/passwords happens on the sysadmin/devops side of things.
Isn't this exactly what CI is supposed to prevent?
CI is only a facilitator, if their test coverage or quality isn't as good as it could be it won't make much difference. Also if it's due to load not sure how much loading testing they would do as part of CI. Having CI and writing automated tests is something everyone seems to agree in theory is a good idea but in my experience hardly anyone does it well because writing features always trumps writing tests. I am not talking about Gitlab specifically, I know absolutely nothing about their set up, only in general.
True story, I am involved with a startup that offers cloud based storage/reporting of test results (https://www.tesults.com) and my colleague just emailed the CTO of Gitlab yesterday to offer a promotion on a plan, very odd indeed to see this story on HN the next day!
I'm rapidly starting to question my use of mid-tier web services. Who else is operating like this? CI/CD, Staging, downtime playbooks, backup playbooks, all of this or any combination of it would have been a good idea. Folks, I just want to work without my tools failing so that I can go home and think about something else.
It's nice that you can spell and all, but bitbucket is also free and we don't see this type of issue there. Or github, or gmail, or google analytics, or...well, you get the point I hope. Free is not a synonym for unreliable, so if a company wants me to sign up to their product and doesn't tell me it's unstable, I'm not going to be terribly impressed if it fails. A clear notice on the homepage would sort this out. Heck, they can even link over to the enterprise edition for anyone who doesn't want to take the risk.
If you mean release-wise it's in step with CE. I host an ee server. I've had very few issues with the actual hosting of it over the last year. Their Omnibus system is fantastic. Most issues are UX/UI related when they break a button or what not.
Gitlab is praised for being transparent in everything they do, so where is the backup infrastructure policy that they should now have in place? I'd like to see that situation proven resolved before we discuss rewriting their front end with Vue.js and any other new deployments.
I understand gitlab to be a test bed. But at least this time they didn't delete the wrong directory. I know, I'm late to the party since service has restored.
Doesn't look like they are, to me. Their main domain's IP address is owned by Microsoft (so, using Azure?), but the status page IP is in a block owned by Digital Ocean.
Edit: I could have sworn I refreshed the page before replying to make sure someone else hadn't already responded, and I didn't see your comment
jschulenklopper. Scary how similar they are lol.