Hacker News new | ask | show | jobs
by marcosdumay 2352 days ago
And, of course, there is nothing that says this would work for intermediate layers, since the sample dimensions may get there from any input.

What works is averaging similar networks and averaging your networks a lot of times.