|
|
|
|
|
by mannykannot
2802 days ago
|
|
Thanks for bringing this to my attenton. In that article, however, it says for neural nets: V is the set of nodes. Each node is a simple computation cell.
E is the set of edges, Each edge has a weight.
....
If the activation function is the sigmoid function and the weights are general, then the VC dimension is ... at most O(|E|^2.|V|^2) [apologies for the crappy formatting] While Someone's quote from the article seems to be suggesting something exponential in the number of edges. |
|
Given the huge range between that upper bound and the lower bound of Ω(|E|²), chances are that upper bound is far from tight (https://en.wikipedia.org/wiki/Upper_and_lower_bounds#Tight_b...).
Also, one line below the O(|E|².|V|²) you quoted:
”If the weights come from a finite family (e.g. the weights are real numbers that can be represented by at most 32 bits in a computer), then, for both activation functions, the VC dimension is at most O(|E|)”
Of course, they may use a different activation function, in which case that mathematical statement doesn’t apply, but I would think it’s more unlikely that applies than the claim made on the article we’re discussing.
For example, it would hugely surprise me if using an activation function that isn’t increasing or that has many large discontinuities behaves a lot better than the sigmoid surely used.