|
|
|
|
|
by seanhunter
130 days ago
|
|
I read that too, but I wondered whether elementwise error is the right metric. Surely the actual error metric should be to evaluate model performance for a conventional transformer model and then the same model with the attention mechanism replaced by this 4th order Taylor approximation? |
|