Hacker News new | ask | show | jobs
by sumnuyungi 1905 days ago
The title is a bit misleading as this algorithm is for feedforward networks and doesn't yet support convolutional layers or any of the SOTA techniques for image classification... which is why GPUs reign supreme for training deep neural nets.
5 comments

NeRF is a good example of a network that doesn't have convolutions yet requires a ton of iterations to train. This paper is particularly relevant to wide networks which are important because CPU memory is currently much cheaper than GPU memory (even for FANG researchers!).
Interesting, I didn't know that NeRF was simply a feedforward network.

I hope that this research group can make more headway into training on CPUs, but I also would like to (naively) see less hyperbolic titles. This paper is not just particularly relevant to wide networks - it's only relevant to wide networks.

I think you mean to say "fully connected" in place of "feed-forward" when trying to draw a distinction with respect to "convolutional".
This is why I come to HN, to find out why it doesn't work in the general case. I can always count on you guys to point out why something is an evolutionary change rather than revolutionary
> doesn't yet

Does that mean it can / will?

The original paper includes convolutional layer support in their future work & next steps. But it's not a foregone conclusion that the same speedup will occur.
True. I know it exists for inference though. Wondering where/when solutions like MKL might work for training.
Right, that's kinda nasty. Titles of papers refer deep learning, but I don't think fully connected networks might be considered a as deep learning.
What? No. Fully connected networks are deep learning, and actually the most important deep learning workload. See: Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective:

https://research.fb.com/publications/applied-machine-learnin...

Table 1 shows News Feed service uses fully connected networks model, and table 3 shows this workload dominates all other workloads.

Transformers, which are currently waging a successful campaign to conquer all Deep Learning, are largely stacked feed-forward networks, matrix multiplies and maps. Some ideas to make attention more scalable, such as LSH or large sparse attention matrices seem like they'd be well suited to this approach.

Their approach should also be readily adaptable to RNNs, including LTSMs.

Certainly worth investigating as an alternative for efficiently running and training giant networks on less expensive hardware.