|
|
|
|
|
by hodgehog11
237 days ago
|
|
I'm a bit confused by this; are you referring to vanishing/exploding gradients during training or iteration at inference? If the former, this is only true if you take too many steps. If the latter, we already know this works and scales well. |
|
The approach of “try a few more things before stopping” is a great strategy akin to taking a few more stabs at RNG. It’s not the same as saying keep trying until you get there - you won’t.