Y
Hacker News
new
|
ask
|
show
|
jobs
by
imranq
481 days ago
Dang only forward passes. The real secret was in the backward pass! I was also curious to learn how they implemented the dualpipe scheduler
1 comments
rfoo
481 days ago
Do they even have an optimized backward? It looks like optimizations like this aren't needed during training. Their V2 paper also suggests so.
link