Hacker News new | ask | show | jobs
by uuwp 2619 days ago
This proof is largely irrelevant in the real world. An interesting question would be how much can be approximated with a model that has 1 MB worth of weights and can use only relu/tanh/softmax activations.
1 comments

The first paragraph of the conclusion addresses that this is merely a proof of what is possible, not what is practical:

> The explanation for universality we've discussed is certainly not a practical prescription for how to compute using neural networks! In this, it's much like proofs of universality for NAND gates and the like. For this reason, I've focused mostly on trying to make the construction clear and easy to follow, and not on optimizing the details of the construction. However, you may find it a fun and instructive exercise to see if you can improve the construction.