|
|
|
|
|
by machinelearning
613 days ago
|
|
This is a good problem to solve but the approach is wrong imo. It has to be done in a hierarchical way to know what you attended to + full context. If the differential vector is being computed with the same input as the attention vector how do you know how to modify the attention vector correctly |
|