Hacker News new | ask | show | jobs
by imranq 481 days ago
Dang only forward passes. The real secret was in the backward pass! I was also curious to learn how they implemented the dualpipe scheduler
1 comments

Do they even have an optimized backward? It looks like optimizations like this aren't needed during training. Their V2 paper also suggests so.