Hacker News new | ask | show | jobs
by londons_explore 2176 days ago
I experimented with networks where weights were removed if they did not contribute much to the final answer.

My conclusion was I could easily set >99% of weights to zero on my (fully connected) layers with minimal performance impact after enough training, but the training time went up a lot (effectively after removing a bunch of connections, you have to do more training before removing more), and inference speed wasn't really improved because sparse matrices are sloooow.

Overall, while it works out for biology, I don't think it will work for silicon.

1 comments

Would you say you found a result similar to the lottery ticket hypothesis? https://arxiv.org/abs/1803.03635
Not really - I had to do multiple steps of 'prune a bit, train a bit' to be able to prune to 99%. If I had done all the pruning in one big step as they do, I don't think it would have trained well, even if I had been able to see the future and remove the same weights.