|
|
|
|
|
by spwa4
180 days ago
|
|
Why not recompile every iteration? Weights are only updated at the end of the batch size at the earliest, and for distributed training, n batch sizes at the fastest, and generally only at the end of an iteration.
In either case the cost of recompiling would be negligeable, no? |
|
So, the killer cost is at compile time, not runtime, which is fundamental to the underlying autograd operation.
On the flip side, it's 2025, not 2006, so pro modern algorithms & heuristics can change this story quite a bit.
All of this is spelled out in Griewank's work (the book).