Hacker News new | ask | show | jobs
by streetcat1 1384 days ago
So if you want, you can create your own scheduler.

Also, regarding preemption, you could just checkpoint your batch jobs. I.e. assume that there are failures, and be ready for them.

I also wonder how many preemptions actually occur in practice. I.e. if not many, you can just treat them as failure and restart your nodes.

The strength of kubernetes is in distribution and ecosystem , which would be very hard to match.

1 comments

This was the exact problem that MapReduce solved. Once you reach a large enough scale (which isn't all that big), you are guaranteed to have failed jobs and a need to recover.