Someone, already, mentioned Argo tool family. I have got very good experience coupling argocd and argo workflows. Its not so advanced (from workflow side) as Airflow, but works as pretty good.
There is always option to fallback to Rundeck and folks from older times
I think the question itself begets more questions. For example, I've seen Jenkins jobs replaced with an Apache Airflow DAG or two, but obviously that's not the right call for continuously deploying applications to K8S clusters, which is where I'd use Tekton, Harness, or Argo CD. If we go way back, companies used to churn out Perl and Ruby scripts and eventually they were replaced with anything from Python to Go to Rust. It always goes back to _why_ someone finds deficiencies in what Jenkins jobs provide. Ask 10 people that hate Jenkins and one is likely to find 10 different products / solutions appropriate for them.
Most folks I've seen replacing Jenkins for a job runner wind up using Ansible, Salt, or Rundeck if they're not in a giant enterprise (there's also the old HP Operations Orchestration system but that's buried now unfortunately). All of these have their own holy wars brewing and deficiencies, sure, but I personally prefer those warts over adding more maintenance issues by adding in Jenkins. I'd also suggest StackStorm for a more modern, async approach to orchestration workflows. The workflow software ecosystem is built around business domain specialization and it's kind of silly to try to go against Jenkins toe to toe as a company, so this is what we've got sadly. After all, Jetbrains makes TeamCity which is essentially a Better Architected than Jenkins system but it's definitely not very popular either.
- Website with access control which supports identity provider.
- Properly authorized user can start a job from their browser, and can specify arguments as well (using checkboxes, input boxes, multi-choice boxes etc...).
- Job works by allocating few machines from the pool and running some shell commands on them. The number of machines, type of machines, and commands to run are all customizable and can vary based on parameters user has entered.
- Once job is running, you can view text logs (in real time) or hit "cancel" button, and it'll properly cancel job and release all the resources. If the job fails for any reason, it sends notifications.
- From web interface, you can view the list of all past jobs, their status, logs, outputs, test results, etc...
- The machines can be physical on-prem ones, or allocated from AWS. For physical machines, all you need is plain linux with ssh access. For AWS machines, machines are created/destroyed in response to load.
- The whole thing needs no infra other than a single master machine which is hosted on premises. As long as you back up the master regularly, you can recover from complete meltdown, the only problem will be that some jobs will get cancelled.
------
Jenkins is actually pretty bad at this task, and we need a fair amount of Groovy code (ugh...) to get it to do what we want.
But there seems to be very little alternatives which can do all that. For example, Apache Airflow is missing authentication for web ui. Tekton seems to require Kubernetes and I (based on reading docs) has no UI controls. etc...
Jenkins can be relegated to simply launching tasks in other systems and having almost no state or configuration in itself then for your needs. One can lock down Airflow access to solely from Jenkins agents or setup an nginx proxy in front of it to protect it from prying eyes.
Jenkins is architecturally an artifact of enterprise software kitchen sinks for the 2000s and the way around it is to cut away at anything one doesn’t need out of Jenkins and to instill discipline in engineers to think about requirements carefully and to avoid hacking more and more into Groovy Jenkins libs than one would spend time actually doing work.
For a lot of what you’re describing Rundeck was built for this common case as essentially a central job portal, and at the least it supports more integration options than Jenkins. It’s difficult to wholeheartedly recommend it still given its surprisingly low rate of adoption. The market now is that for an OSS product a decent web UI with authentication is basically something to charge like it’s an enterprise product, and so lots of start-ups even today will begrudgingly fire up Jenkins in 2021.
I’m of the opinion that the time fighting Jenkins jobs and writing Groovy for common, trivial tasks in most CI systems is not acceptable in a start-up situation where strong focus and minimizing distractions / non-core work is so important for success. The kind of code to define a build doesn’t make sense for orchestration (it’s not just a digraph but about how to react to events and edge transitions) but it’s exactly how Jenkins DSLs work out.
An ability to share machine pool with a CI system, or having an integrated CI is a plus.