EDIT: so the main thing is technologies like Ray have a way to do these things, but I honestly just want an easy way to do this. Maybe means I will have to set up something with Ray and AWS myself and make a wrapper for that?
I haven't used ray, but I've read a bit of documentation on it and from what I gather, you install a daemon on the box (ray core) and can send it commands that it executes. Along the way, you can keep state, store data and schedule things.
That's what I would want something to do if I was building tooling like this myself. Although, I'd do it in golang instead of python so that the dependency chain was simpler. A single small binary is nicer imho.
EDIT: so the main thing is technologies like Ray have a way to do these things, but I honestly just want an easy way to do this. Maybe means I will have to set up something with Ray and AWS myself and make a wrapper for that?