This is going to be really great for batch jobs which need isolated environments. I have been waiting for something like this for a long time. Amazon is really doing work. I'll be definitely be using this.
My org is looking to move machine learning to batch as the underlying infra.
All I want is to be able to do this:
1) specify a DAG of tasks. Each task is a docker image, CMD string, CPU and memory limits
2) hit an API to run it for me. Each task runs on a new spot instance
3) be able to query this service about the state of the DAG and of each individual node
Sounds like if AWS provides an API to create a batch cluster (or whatever you call it) and lets the tasks be defined in terms of what docker image to run with what command you'll satisfy this desire
That is in line with our vision for Batch; to be the engine for systems where you essentially describe a DAG and we run and hyperoptimize the execution for you. We do some of what you’re asking for but that’s great feedback around what you’d like to do.
I'm curious, what would be the interaction between Batch and Fargate? Right now I use Batch to run a container and then exit out with as little thought about the underlying machine as possible. Is Fargate a push further towards serverless?
One concern I had with Fargate from the product description is around the configuration options. Our models require more than 100GB to build, but I'm seeing "Max. 30GB".
I haven't really tried batch. But, from initial reading of documentation it didn't look like it supported running docker images. My use case requires running docker images of static site generators and that sort. Will take another look at it.
https://aws.amazon.com/batch/