Hacker News new | ask | show | jobs
by magicalhippo 761 days ago
As mentioned this is difficult. AFAIK the main reason is that the power of neural nets come from the non-linear functions applied at each node ("neuron"), and thus there's nothing like the superposition principle[1] to easily combine training results.

The lack of superposition means you can't efficiently train one layer separately from the others either.

That being said, a popular non-linear function in modern neural nets is ReLU[2] which is piece-wise linear, so perhaps there's some cleverness one can do there.

[1]: https://en.wikipedia.org/wiki/Superposition_principle

[2]: https://en.wikipedia.org/wiki/Rectifier_(neural_networks)