Hacker News new | ask | show | jobs
by gojomo 3880 days ago
At least back in 2012, Google research seemed to suggest that distributed CPU training of large models could sometimes be preferable to fitting within the limits of GPUs:

http://research.google.com/archive/large_deep_networks_nips2...