|
|
|
|
|
by jostmey
3389 days ago
|
|
EDIT: The attention mechanism is nothing more than a weighted average. The weighted average is computed as a running average by saving the numerator and denominator terms at each step. I hope the description in the paper is clear. You can follow the ARXIV link in the README. Skip straight to section 2 for the details of the model |
|