Hacker News new | ask | show | jobs
by jackblemming 871 days ago
> the topics in the parent post should not be a major surprise to anyone who has read https://people.math.harvard.edu/~ctm/home/text/others/shanno... !

> which clearly explains it (and said emergent phenomena)

Very smart information theory people have looked at neural networks through the lens of information theory and published famous papers about it years ago. It couldn't explain many things about neural networks, but it was interesting nonetheless.

FWIW it's not uncommon for smart people to say "this mathematical structure looks like this other idea with [+/- some structure]!!" and that it totally explains everything... (kind of with so and so exceptions, well and also this and that and..). Truthfully, we just don't know. And I've never seen theorists in this field actually take the theory and produce something novel or make useful predictions with it. It's all try stuff and see what works, and then retroactively make up some crud on why it worked, if it did work (otherwise brush it under the rug).

There was this one posted recently on transformers being kernel smoothers: https://arxiv.org/abs/1908.11775

3 comments

I think there is more here than a backward look.

The article introduced a discrete algorithm method for approximating the gradient optimization model.

It would be interesting to optimize the discrete algorithm for both design and inference times, and see if any space or time advantages over gradient learning could be found. Or if new ideas popped as a result of optimization successes or failures.

It also might have an advantage in terms of algorithm adjustments. For instance, given the most likely responses at each step, discard the most likely whenever follow ups are not too far below - and see if that reliably avoided copyright issues.

A lot easier to poke around a discrete algorithm, with zero uncertainty as to what is happening, vs. vast tensor models.

> It's all try stuff and see what works, and then retroactively make up some crud on why it worked

People have done this in earlier days too. The theory around control systems was developed after PID controllers had been succesfully used in praxis.

> It's all try stuff and see what works, and then retroactively make up some crud on why it worked, if it did work (otherwise brush it under the rug).

Reminds me of how my ex-client's data scientists would develop ML models.