|
|
|
|
|
by csense
238 days ago
|
|
A few critiques: - If you have a feature detector function (f(x) = 0 when feature is not present, f(x) = 1 when feature is present) and you train a network to compute f(x), or some subset of the network "decides on its own during training" to compute f(x), doesn't that create a zero set of non-zero measure if training continues long enough? - What happens when the middle layers are of much lower dimension than the input? - Real analyticity means infinitely many derivatives (according to Appendix A). Does this mean the results don't apply to functions with corners (e.g. ReLU)? |
|