Hacker News new | ask | show | jobs
by supple-mints 717 days ago
Is it harder to train the wider network or the deeper network all else equal?
1 comments

Post author here, if you look at MFU, then the wider layers win out, and init takes much longer the more you add layer