|
|
|
|
|
by subtypefiddler
2053 days ago
|
|
It's also important to note that they work despite being wide, you can see that with the efficiency of pruning, and ideas such as the lottery ticket hypothesis that state that "successful" sub-networks within the wide network account for most of the performance. In the theory literature, if you have a K-deep network, K=1 is the shallow case, K>1 is deep. Agreed naming could be better, but it's not like "deep work" or "deep thoughts" as the parent was stating. |
|