I managed to get to 0.01 loss from only x1/x2, using 3 hidden layers, L1 regularization, a bit of added noise, and some patience: http://i.imgur.com/Y3zKpJF.png
Yes, noise & regularization seem to be key here. I've gotten a 2-layer with 7/8 neurons down to 0.06 and dropping but only with noise & l1: http://playground.tensorflow.org/#activation=relu®ulariza... Final loss of 0.051. Interestingly, increasing noise from 10 to 15 destroys performance, loss of 0.47.