| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sumnuyungi 1905 days ago
	The title is a bit misleading as this algorithm is for feedforward networks and doesn't yet support convolutional layers or any of the SOTA techniques for image classification... which is why GPUs reign supreme for training deep neural nets.

5 comments

choppaface 1905 days ago

NeRF is a good example of a network that doesn't have convolutions yet requires a ton of iterations to train. This paper is particularly relevant to wide networks which are important because CPU memory is currently much cheaper than GPU memory (even for FANG researchers!).

link

sumnuyungi 1904 days ago

Interesting, I didn't know that NeRF was simply a feedforward network.

I hope that this research group can make more headway into training on CPUs, but I also would like to (naively) see less hyperbolic titles. This paper is not just particularly relevant to wide networks - it's only relevant to wide networks.

link

gugagore 1904 days ago

I think you mean to say "fully connected" in place of "feed-forward" when trying to draw a distinction with respect to "convolutional".

link

stjohnswarts 1904 days ago

This is why I come to HN, to find out why it doesn't work in the general case. I can always count on you guys to point out why something is an evolutionary change rather than revolutionary

link

waheoo 1905 days ago

> doesn't yet

Does that mean it can / will?

link

sumnuyungi 1905 days ago

The original paper includes convolutional layer support in their future work & next steps. But it's not a foregone conclusion that the same speedup will occur.

link

ramoz 1904 days ago

True. I know it exists for inference though. Wondering where/when solutions like MKL might work for training.

link

shgidi 1904 days ago

Right, that's kinda nasty. Titles of papers refer deep learning, but I don't think fully connected networks might be considered a as deep learning.

link

sanxiyn 1904 days ago

What? No. Fully connected networks are deep learning, and actually the most important deep learning workload. See: Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective:

https://research.fb.com/publications/applied-machine-learnin...

Table 1 shows News Feed service uses fully connected networks model, and table 3 shows this workload dominates all other workloads.

link

Cybiote 1904 days ago

Transformers, which are currently waging a successful campaign to conquer all Deep Learning, are largely stacked feed-forward networks, matrix multiplies and maps. Some ideas to make attention more scalable, such as LSH or large sparse attention matrices seem like they'd be well suited to this approach.

Their approach should also be readily adaptable to RNNs, including LTSMs.

Certainly worth investigating as an alternative for efficiently running and training giant networks on less expensive hardware.

link