Hacker News new | ask | show | jobs
by yoquan 1564 days ago
Actually no. Each layer requires output from previous one, which means sequentially computation. While wider layers can utilize GPU parallel computation better. This is kind of trade-off between less memory (less parameters) vs longer time.