Y
Hacker News
new
|
ask
|
show
|
jobs
by
rfoo
483 days ago
Do they even have an optimized backward? It looks like optimizations like this aren't needed during training. Their V2 paper also suggests so.