|
|
|
|
|
by dgacmu
3322 days ago
|
|
You also need a few other operations for training, such as transpose, which may or may not be fast in a particular implementation. (ETA: In case it's not obvious, I'm agreeing with david-gpu's comment, and adding more reasons that training currently differs from inference.) |
|