|
|
|
|
|
by yorwba
167 days ago
|
|
What evidence against it do you have in mind? I think it's a result of little practical relevance without a way to identify winning tickets that doesn't require buying lots of tickets until you hit the jackpot (i.e. training a large, dense model to completion) but that doesn't make the observation itself incorrect. |
|
https://youtu.be/WW1ksk-O5c0?list=PLCq6a7gpFdPgldPSBWqd2THZh... (timestamped)
At the timestamp they discuss how actually the original ICLR results only worked on these extremely tiny models and larger ones didn't work. The adaptation you need to sort of fix it is to train densely first for a few epochs, only then you can start increasing sparsity.