For the first time I was able to complete a build last night, Ubuntu 16.04, CUDA 8.0 RC + compiler patch, cuDNN 5.1, nvidia-driver-370, python-2.7, and compute capability 6.1 (for Pascal GPU) - but only when I switched to the r0.10 branch.
With r0.10 I see none of the multiple failure modes that I always see with master. It just went straight ahead and compiled the whole thing.
fwiw: twice now, I've successfully gotten a pip package linked with CUDA 8 & built Tensorflow from source — once for Python 2 and another for Python 3. Both on an Ubuntu 14.04 system