|
|
|
|
|
by yoquan
1564 days ago
|
|
Actually no. Each layer requires output from previous one, which means sequentially computation. While wider layers can utilize GPU parallel computation better. This is kind of trade-off between less memory (less parameters) vs longer time. |
|