| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pixl97 308 days ago
	I think a lot of it is the massive amount of compute we've got in the last decade. While inference may have been possible on the hardware the training would have taken lifetimes.

1 comments

graemep 308 days ago

I have a textbook somewhere in the house from about 2000 that says that there is no point having more than three layers in a neural network.

Compute was just too expensive to have neural networks big enough for this not to be true.

link

AndrewOMartin 307 days ago

Once you have three layers (i.e. one "hidden" layer) then you can map to arbitrary functions, so a three layer network has the same "power" as an arbitrarily large network.

I'm sure that's what the text book meant, rather than any point about the expense of computing power.

link

leumassuehtam 308 days ago

People believe that more parameters would lead to overfit instead generalization. The various regularization methods we use today to avoid overfit hadn't been discovered yet. Your statement is mostly likely about this.

link

Silphendio 307 days ago

I think the problems with big network were diminishing gradients, which is why we now use the ReLU activation function, and training stability, which were solved with residual connections.

Overfitting is the problem of having too little training data for your network size.

link

graemep 308 days ago

Possibly, I would have to dig up the book to check. IIRC it did not mention overfitting but it was a long time ago.

link