| No, LLMs with soft attention use compression, and actually has no mechanism for ground truth. They are simply pattern finding and matching. More correctly, they are uniform consent depth threshold circuits. Basically parallel operations on a polynomial number of AND, OR, NOT, and majority gates. The majority gates can do the Parity function, but cannot self correct like ECC does. The thing with majority gates is that they can show some input is in the language: This the truthiness of 1,1,1,0,0 being true, but 1,1,0,0,0 would be failure as negation, but doesn't prove that negation, it isn't a truthy false. With soft attention will majority gates they can do parity detection but not correction. Hopefully someone can correct this if I am wrong. Specifically I think that the upper bound of deciding whether X = x is a cause of m) in structures is
NP-complete in binary models (where all variables can take
on only two values) and Σ_2^P
-complete in general models. As TC_0 is smaller than NP, and probably smaller than P, any methods would be opportunistic at best. Preserving the long tail of a distribution is a far more pragmatic direction as an ECC type ability is unreasonable. Thinking of correctional codes as serial turing machine and transformers as primarily parallel circuits should help with understanding why they are very different. |