Hacker News new | ask | show | jobs
by derbOac 3 days ago
You might be completely correct, although my hunch is this is something that would require a change in architecture rather than increases in scale.

The failure points happen in a fairly simple task (Stroop) with increases in repetition of trials. It's not like the number of colors or color words is increasing, which is the sort of thing I might expect if it had to do with the size of the LLM.

On the other hand who knows. I agree that model scale changes make a lot of things a moving target.

At first I thought this paper was kind of odd, but then I felt like it was maybe possibly onto something important. Intuitively I could see the possibility that whatever is causing this failure in the Stroop task might be related to the tendency of LLMs to be "derailable".

1 comments

Aren't transformers universal function approximators? It seems pretty easy to see executive function as a simple computation. So it would be trivially true that a sufficiently large transformer could model executive function because it could approximate [current transformer] + [an approximation of the executive function algorithm] + [whatever bloat is needed to store state in a transformer].

It seems hard to come up with an argument that executive function can't possibly be approximated with an algorithm. Executive function is basic once the clustering into objects part of the process is done. The only real questions are whether a transformer of sufficient scale is feasible on current hardware and if the engineers with access to the hardware have figured out what to train for yet.

> whether a transformer of sufficient scale is feasible on current hardware

Just because a particular approach (such as a neural network or transformer) can approximate something doesn't necessarily mean it can do so efficiently. However I share what I infer to be your suspicion that executive function can in fact be easily modeled. I think it's likely to develop on its own depending on the training methodology used.