| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tmaczukin 2851 days ago

> - we do not use kubernetes so eveything CD is off the plate for us (environment and monitoring tab are useless)

Environments can be useful even without integration with K8S. It's useful e.g. for review apps feature (https://docs.gitlab.com/ee/ci/review_apps/index.html) which don't need to be hosted on K8S. Look on the https://gitlab.com/gitlab-org/gitlab-runner/environments, where we're using environments to track our releases, e.g. the download pages hosted on AWS S3. Another example is https://gitlab.com/gitlab-com/www-gitlab-com/environments - and again our about.gitlab.com website have each MR deployed as a review app without usage of K8S, but enviroments feature is used to track all deployments, link them from MR page and automatically delete review deployments when the MR is merged or closed.

> - DO NOT USE THE BUILT IN CACHE, it's super slow and will fail unexpectedly (simply do cp to s3 and it will never fail)

Are you referencing cache configured for Shared Runners on GitLab.com or the cache feature in general?

I need to agree that we had many strange problems with the cache in the past for Shared Runners on GitLab.com. Even now the feature is not always working as we would like to, and this is something that we're already thinking about how we could improve it: https://gitlab.com/gitlab-com/infrastructure/issues/4565.

But in general - I can't agree that the feature is not working and should not be used. In most of the time we had no problems with using the distributed cache with S3. When cache servers are stable, the feature just works. I also can't agree with that manual copy to S3 will be faster than copy to S3 made by Runner - in the end both are simple HTTP PUT requests send to chosen S3 server.

Also remember, that in some cases it's better to use the local cache instead of remote cache feature. With files stored locally there is no much things that can go wrong and it's definitely the fastests solution (however it can't be used for all workflows).

> - IF YOU USE THE BUILT IN CACHE, parallelism will be hard (you cannot populate part of the cache from a job, another part from another job and in the next step use the result of both cache)

Well, it depends :)

Our cache feature was designed with specified workflows in mind. The priorit is to allow a particular job to be speed up (but the job should be configured in the way that it will still work even if the cache is not available). We've made possible to re-use cache between parallel jobs, but as usual with more complex designs - it's hard to handle all cases.

But what it was not designed to, and what is confusing new users from time to time, is passing things from one job to another. This is where artifacts feature should be used. Cache feature was just never designed for this and we were always loud about this :)

But it doesn't mean that cache can't be used with parallel pipeline. Using configuration features like `key` and/or `policy` and configuring this properly for different jobs, it's possible to prepare cache in one job and then re-use it for many parallel jobs in next stages. This is exactly what's done for the GitLab CE and GitLab EE project: https://gitlab.com/gitlab-org/gitlab-ce/blob/v11.2.0/.gitlab.... Look for `default-cache`, `push-cache` and `pull-cache` YAML anchors and check how they are used next. In GitLab CE's pipeline, in the `setup-test-env` job `bundle install` is called and all downloaded gems are next turned into cache. In the next stage, where all tests are being executed, the same cache is downloaded what speeds up the `bundle install` executed in all test jobs.

So in the end, it depends on what you're expecting:

- If you want to pass things from one job to another: it's not cache that doesn't work. You just should use artifacts for this, since cache was never designed to handle such workflow.

- If you have not too complicated Pipeline, then configuring cache for parallel usage should not be a big problem.

- If you have a complex pipeline... well - there definitelly will be cases when our cache feature will be not much useful. And in that cases one need to chose if he wants to refactor the pipeline so it will fit to how cache is working or looking on own way to speed up jobs. But I'd say that in most cases it's posibble to configure the pipeline in the way, that it will be able to use cache.