|
|
|
|
|
by mr_toad
618 days ago
|
|
You need a non-linear activation function for the universal approximation theorem to hold. Otherwise, as others have said the model just collapses to a single layer. Technically the output is still what a statistician would call “linear in the parameters”, but due to the universal approximation theorem it can approximate any non-linear function. https://stats.stackexchange.com/questions/275358/why-is-incr... |
|