| HN Mirror

That's not how LLM's work. LLM's complete documents, they don't make statements about LLM's unless you explain to them how they should do it and give them all the information they need. If you could extract the information from an LLM well enough to supply that to an LLM with an explanation on how to summarize the behaviour of the LLM to a human, we would have already done that to a PhD student instead. A PhD student is a little bit slower than an LLM, but they require a lot less explanation.

In any case, looking at and understanding how a neural network encodes information is like gene editing. Perhaps you could isolate a gene in the human genome that achieves something interesting like giving a child blue eyes. But even if you would do that, there's a chance you break something else if you modify that gene and give the child health risk. Since all neurons in a deep neural network are interconnected, there is a butterfly effect in it that makes them inherently somewhat of a black box.