Hacker News new | ask | show | jobs
by bitL 2917 days ago
> Instead, you can just wrap the DataGenerator in a simple function that lazily outputs the next batch of training examples.

You probably know those simple generators aren't recommended to be used by Keras, instead keras.utils.Sequence is preferred due to (Keras doc): "Sequence are a safer way to do multiprocessing. This structure guarantees that the network will only train once on each sample per epoch which is not the case with generators."

I couldn't see any equivalent of this for estimators, sadly, and wrapping it up in a naive generator seemed like a functionality downgrade.

> because this is part of the compiled Keras model, before ever converting anything to TensorFlow Estimator

Right, you specify a loss function before compiling, however if it is a custom one and you for some reason need to reload a model snapshot (i.e. resuming training), you need to provide it separately or the loading fails. I haven't found any docs on this. Imagine your training optimizer automatically generating loss functions by means of function composition, e.g. you put a mix of +-*/,log,exp,tanh etc. based off some past training experience of what helped in individual cases/literature, then taking 1000s of these loss functions and pushing them to a large cluster where they are scored on how well did they perform, keeping only the best performing ones.

Class weights are specified in fit_generator(), not in compile time; again, here I couldn't find any description on how to convert Keras' weight dictionary to what TensorFlow needs.

> Callbacks... "Penalizing" Keras because TensorFlow offers less functionality doesn't seem right.

The thing here is that some of those callbacks are mandatory for a training to converge, e.g. decreasing learning rate, escaping plateau situations, computing various stats that aren't provided by Keras (outside loss/accuracy; you might want F1, Fleiss/Cohen's Kappa, Matthews correlation coefficient, AUC ROC etc.) that might be decisive for keeping/discarding a model; then also multi GPU callbacks; some people even use callbacks to perform the whole distributed computation as well. In my examples, if I remove any of those callbacks, my models won't achieve any kind of usability but with those callbacks I match world-class results. I couldn't find any non-insane way to map them to TensorFlow prior to our conversation.

As I mentioned, I have a very large cluster, each node with multiple GPUs, so I need an orchestration on both hyperparameters/loss functions per node as well as within each node to run on multiple GPUs.

The page from Keras you mentioned was precisely my starting point and from those tf.Estimator seemed the last devops-intense way to go (Horovod needs MPI and CERNDB/Keras Spark).

I'll take a deeper look into SessionRunHook you mentioned - thanks! ;-)

1 comments

For class weights, the easiest thing is to just generate that as another one of the items placed into the input function dictionary, e.g. when you wrap the DataGenerator. Then have a custom loss function that takes this input element and applies the weight for that training sample. Again, the need to do slight extra work is a limitation of TensorFlow here, not of Keras, but because Keras is so flexible, it's super easy to work around.

> "computing various stats that aren't provided by Keras.."

It seems like you have this backward. Keras provides the easy interface to create the custom callbacks. That's why you can create extra convergence metrics, etc., that are far harder to use if implementing in pure TensorFlow. The part where TensorFlow is specifically lacking functionality is in its ability to handle these callbacks (both pre-built in Keras or user-defined). I've had good success with the solution I mentioned with SessionRunHooks, but still, it is a terrible design choice by the TensorFlow people to create this in a way that is not directly compatible with all the work Keras had done.

> "from those tf.Estimator seemed the last devops-intense way to go (Horovod needs MPI and CERNDB/Keras Spark)."

Just based on how poorly designed the tf.Estimator API is though, I'm not actually sure the other methods would require less devops or less investment. In some cases for standard models, yes. But if you've already committed to using Keras for very customized situations, then going back to the dark ages with native TensorFlow will often be much more work and more error prone than using the other solutions. The Horovod dependence on MPI in particular is fairly simple and needs little management. Most people having done ML / stats PhDs will already have managed far more difficult situations with MPI previously anyway, or at least have the Linux skills needed. The point is you have a fighting chance, whereas deciphering undocumented and badly designed corners of TensorFlow often leaves you with no fighting chance.