| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rxin 3799 days ago
	Did you actually read the article? It was using Spark to parallelize hyperparameter tuning, which is embarrassingly parallel.

2 comments

doobwa 3799 days ago

Why not just use GNU Parallel (or something similar) instead of Spark?

link

elyase 3799 days ago

I think this could have been done with GNU parallel. One advantage I see with Spark is that is that it is easier to interact with Python, for example these two lines are all is needed to call the relevant Python function:

  urls = sc.parallelize(batched_data)
  labelled_images = urls.flatMap(apply_batch)

So if you already have a cluster with Spark installed (like Databrick does) then it takes less work to just call your Python code than setting up a GNU Parallel cluster and a writing a small wrapper script. Additionally a Python script would have to load/init the models on every call from Parallel. I agree that this is not a great demonstration of Spark main strengths.

link

orm 3799 days ago

I think one reason would fault tolerance. Is there a fault tolerance layer in GNU parallel? last time I checked their homepage ( a few minutes ago), there was no reference to fault tolerance.

Another reason is, perhaps, scheduling.

link

chimtim 3799 days ago

what fault tolerance does spark give you in this scheme? It cannot look into TF progress and checkpoint all state. Using Spark with TF, seems like an overkill -- you need to manage and install two framework what should ideally be a 200 line python wrapper or small mesos framework at most.

link

ole_tange 3798 days ago

Does --retries count as fault tolerance?

link

gcr 3799 days ago

Oh dear. You're right, sorry. Shouldn't have commented before actually reading the article...

link