Hacker News new | ask | show | jobs
by scottlegrand 3693 days ago
Absolutely 100% agree, but at the same time, I think we will ultimately need to build and evaluate models that can span the memory of more than one processor. I don't think a single GTX Titan X, GTX 1080 or even a server is enough here.

Additionally, data parallelization and ASGD broadly disallow these larger models (yes I know about send/receive nodes in TensorFlow, but they're not general or automatic enough for researchers IMO) while ASGD makes horribly inefficient use of the very limited bandwidth between processors. All IMO of course. There are hacks and tricks here, but I think those should be late stage optimizations, not requirements to achieve scaling.

Finally, I'm a stickler for deterministic computation as someone who spent a decade writing graphics drivers before joining the CUDA team in 2006, but that's pretty much a "hear me now, believe me later" opinion of mine after tracking down too many bizarro race conditions late into the night in that former life :-). Of course, one person's race condition can sometimes be an ANN's regularizer, but I digress.

I also agree we'll do some amazing things with far fewer neurons and weights than an actual human brain, but I'll bet you good money we end up needing more than 12GB to do it. AlphaGo alone was 200+ GPUs, right?