| From the article: "We currently don't understand how to make sense of the neural activity within language models." "Unlike with most human creations, we don’t really understand the inner workings of neural networks." "The [..] networks are not well understood and cannot be easily decomposed into identifiable parts" "[..] the neural activations inside a language model activate with unpredictable patterns, seemingly representing many concepts simultaneously" "Learning a large number of sparse features is challenging, and past work has not been shown to scale well." etc., etc., etc. People say we don't (currently) know why they output what they output, because .. as the article clearly states, we don't. |
A good example would be planes - it took a long while to develop mathematical models that could be used to model behavior. Meanwhile practical experimentation developed decent rule of thumb for what worked / did not work.
So I don't think it's fair to say that "we don't" (know how neural networks work), we don't have math / models yet that can explain/model their behavior...