| HN Mirror

You demonstrated it for a reeeeeeeallly constrained version of the problem. Do you expect your solution would generalize to many lists? Because it would be easy to make a neural network that does, while your toy example (and larger generalizations) probably won't generalize super well.

x_i = ith list element from list x

y = sum(x_i * softmax(k * x)_i)

This one parameter, arbitrarily wide network one will get arbitrarily close to the max function.

This is a super toy version of why attention is so effective. It can pick stuff.