Hacker News new | ask | show | jobs
by orm 3792 days ago
I think one reason would fault tolerance. Is there a fault tolerance layer in GNU parallel? last time I checked their homepage ( a few minutes ago), there was no reference to fault tolerance.

Another reason is, perhaps, scheduling.

2 comments

what fault tolerance does spark give you in this scheme? It cannot look into TF progress and checkpoint all state. Using Spark with TF, seems like an overkill -- you need to manage and install two framework what should ideally be a 200 line python wrapper or small mesos framework at most.
Does --retries count as fault tolerance?