|
|
|
|
|
by yobbo
280 days ago
|
|
Yes, the purpose is to verify the gradient computations which are typically incorrect on the first try for things like self-attention and softmax. It is very slow. It is not necessary for auto-differentiation, but this project does not use that. |
|