Hacker News new | ask | show | jobs
by Cacti 2704 days ago
Are you sure you’re writing it in a reasonable way? I’ve made some awfully large nets (implementing current state of the art models) with TF and I’ve never ran into anything like that. I mean I might have 30 seconds to a minute before it gets moving but that’s the most I’ve seen, and that’s including the entire init process (reserving GPUs, preprocessing enough data to keep the cache moving, initializing variables for optimizers and such, and so on). Some of these models have an absolute ton of ops.

What are you doing that requires a 10, 30, 45 min graph build?

1 comments

I'm fairly sure the model is implemented in a reasonable way. It's an experimental deep generative model based on https://github.com/openai/glow, though more complex because the warp and its inverse are evaluated at training time, and the outputs fed to other things. The warp has around 200 layers, IIRC. The model requires keeping track of the evolution of the log-determinant of the warp after each operation, along with the derivatives of those things... so the graph can get pretty huge.
Interesting, thanks.