| HN Mirror

I haven’t even tried to hunt down the book referenced on Wikipedia, but I think it’s worse than that. The Wikipedia page says ”The VC dimension of a neural network is bounded as follows”. “Is bounded” is an expression in mathematics that is more about what we know about a problem, than about the problem itself (as a classical example, see https://en.wikipedia.org/wiki/Graham's_number#Context. Graham’s number ‘bounds’ a number whose value we know to be at least 13)

Given the huge range between that upper bound and the lower bound of Ω(|E|²), chances are that upper bound is far from tight (https://en.wikipedia.org/wiki/Upper_and_lower_bounds#Tight_b...).

Also, one line below the O(|E|².|V|²) you quoted:

”If the weights come from a finite family (e.g. the weights are real numbers that can be represented by at most 32 bits in a computer), then, for both activation functions, the VC dimension is at most O(|E|)”

Of course, they may use a different activation function, in which case that mathematical statement doesn’t apply, but I would think it’s more unlikely that applies than the claim made on the article we’re discussing.

For example, it would hugely surprise me if using an activation function that isn’t increasing or that has many large discontinuities behaves a lot better than the sigmoid surely used.