| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by montebicyclelo 780 days ago

The success we're seeing with neural networks is tightly coupled with the ability to scale - the algorithm itself works at scale (more layers), but it also scales well with hardware, (neural nets mostly consist of matrix multiplications, and GPUs have specialised matrix multiplication acceleration) - one of the most impactful neural network papers, AlexNet, was impactful because it showed that NNs could be put on the GPU, scaled and accelerated, to great effect.

It's not clear from the paper how well this algorithm will scale, both in terms of the algorithm itself (does it still train well with more layers?), and ability to make use of hardware acceleration, (e.g. it's not clear to me that the structure, with its per-weight activation functions, can make use of fast matmul acceleration).

It's an interesting idea, that seems to work well and have nice properties on a smaller scale; but whether it's a good architecture for imagenet, LLMs, etc. is not clear at this stage.

1 comments

dist-epoch 780 days ago

> with its per-weight activation functions

Sounds like something which could be approximated by a DCT (discrete cosine transform). JPEG compression does this, and there are hardware accelerations for it.

> can make use of fast matmul acceleration

Maybe not, but matmul acceleration was done in hardware because it's useful for some problems (graphics initially).

So if these per weight activations functions really work, people will be quick to figure out how to run them in hardware.

link