| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by w-m 1063 days ago

Integral Neural Networks (CVPR 2023 Award Candidate), a nifty way of building resizable networks.

My understanding of this work: A forward pass for a (fully-connected) layer of a neural network is just a dot product of the layer input with the layer weights, followed by some activation function. Both the input and the weights are vectors of the same, fixed size.

Let's imagine that the discrete values that form these vectors happen to be samples of two different continuous univariate functions. Then we can view the dot product as an approximation to the value of integrating the multiplication of the two continuous functions.

Now instead of storing the weights of our network, we store some values from which we can reconstruct a continuous function, and then sample it where we want (in this case some trainable interpolation nodes, which are convoluted with a cubic kernel). This gives us the option to sample different-sized networks, but they are all performing (an approximation to) the same operation. After training with samples at different resolutions, you can freely pick your network size at inference time.

You can also take pretrained networks, reorder the weights to make the functions as smooth as possible, and then compress the network, by downsampling. In their experiments, the networks lose much less accuracy when being downsampled, compared to common pruning approaches.

Paper: https://openaccess.thecvf.com/content/CVPR2023/papers/Solods...

Code: https://github.com/TheStageAI/TorchIntegral

6 comments

numbers_guy 1063 days ago

Going just by your description this sounds like they are doing operator learning. It's actually a very old idea. The proof that started operator learning is from 1988 I believe. Mathematicians have been playing around with the idea since 2016 at least.

link

w-m 1063 days ago

Indeed, this seems closely related, thanks for the pointer!

Unfortunately I'm not deep enough into the topic to understand what their contribution to the theory part of it is. (they have some Supplementary Material in [INN Supp]). In the discussion of the Integral Neural Networks (INN) paper, there's this paragraph about an operator learning publication:

"In [24] the authors proposed deep neural networks with layers defined as functional operators. Such networks are designed for learning PDE solution operators, and its layers are continuously parameterized by MLPs only along the kernel dimensions. A re-discretization was investigated in terms of training on smaller data resolution and testing on higher input resolution. However, the proposed framework in [24] does not include continuous connections between filters and channels dimensions."

Also the weight permutation to perform the resampling on pretrained networks in INNs seems to be novel? And I guess it doesn't hurt that they're bringing new eyeballs to the topic, by providing examples of common networks and a PyTorch implementation.

[INN Supp]: https://openaccess.thecvf.com/content/CVPR2023/supplemental/...

[24]: Zongyi Li Nikola Kovachki. Neural operator: Graph kernel network for partial differential equations. arXiv preprint arXiv:2003.03485, 2020, https://arxiv.org/abs/2003.03485

link

dicroce 1063 days ago

Damn. It's like jpeg for neural networks.

link

kirill_sldskkh 1062 days ago

Great understanding of the work! I will add more details about INNs.

* In fact, INNs concept opens possibility to utilise differential analysis for DNNs parameters. Concept of sampling and integration can be combined with Nyquist theorem (https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampli...). Analysing the FFT image of weights allows to create the measure of a layer capacity. Two different size DNNs can be equivalent after conversion to INN because max frequency is the same for both networks.

* Tuning the integration grid is actually first steps for fast knowledge extraction. We have tested INNs on discrete EDSR (super-resolution) and have prune without INN training in 1 minute. We can imagine situation when user fine-tunes GPT-4 for custom task just by integration grid tuning simultaneously reducing number of model parameters keeping only important slices along filters/rows/heads etc. Because of smooth parameters sharing new filters/rows/heads include "knowledge" of neighbours.

* Also interesting application is to utilise integral layers for fast frame interpolation. As conv2d in INN can produce any number of output channels i.e. frames.

You can stay tuned and also check Medium on INN progress and applications. New Medium article already available: https://medium.com/@TheStage_ai/unlocking-2x-acceleration-fo...

link

smaddox 1063 days ago

Nice. I was wondering if something like this is possible a few days ago. The next step would be somehow extending the discrete->continuous concept to layers.

link

smaddox 1063 days ago

Ahh, I guess that's been done, too: https://proceedings.neurips.cc/paper_files/paper/2018/file/6...

Now we just need an iterative solver over both the structure and the "weights", and we get both architecture search and training at the same time

link

diracs_stache 1063 days ago

After finally learning some complex integrals/residue theory and seeing the connection to continuous and discrete signal processing I was very happy that the "magic trick" disappeared, your comment has me interested in pulling the string farther. Thanks!

link

llaolleh 1063 days ago

Supercool.

link

ninjaa 1063 days ago

smart

link