|
|
|
|
|
by george3d6
2270 days ago
|
|
> l (feed forward network) won't do great. See my answer below, in the case of this problem a generic feed-forward network, even a simple one, will work. Not any ffn, but assuming you are using an efficient architecture search it will probably find one that works. There's other numerical problems where this doesn't hold but that's another story. |
|
x_i = ith list element from list x
y = sum(x_i * softmax(k * x)_i)
This one parameter, arbitrarily wide network one will get arbitrarily close to the max function.
This is a super toy version of why attention is so effective. It can pick stuff.