|
|
|
|
|
by brianchu
3182 days ago
|
|
This article is not that detailed, but it's a sentiment I agree with, so I'll add one major shortcoming of Tensorflow: its memory usage is really bad. The default behavior of TF is to allocate as much GPU memory as possible for itself from the outset. There is an option (allow_growth) to only incrementally allocate memory but when I tried it recently it was broken. This means there aren't easy ways to figure out exactly how much memory TF is using (e.g. if you want to increase the batch size). I believe you can use their undocumented profiler, but I ended up just tweaking batch sizes until TF stopped crashing (yikes). TF does not have in-place operation support for some common operations that could use it, like dropout (other operations do have this support, I believe). Even Caffe, which I used for my research in college, had this. This can double your GPU RAM usage depending on your model, and GPU RAM is absolutely a precious resource. Finally, I've had issues where TF runs out of GPU RAM halfway through training, which should never happen - if there's enough memory for the first epoch, there should be enough memory for every epoch. The last thing I want to do is debug a memory leak / bad memory allocation ordering in TF. |
|
There is also per_process_gpu_memory_fraction, which limits Tensorflow to only allocate that fraction of each visible GPUs memory. It's still not great, but has been helpful in keeping resources free for models that do not need all the GPUs memory.