It's very funny that people hold the autoregressive nature of LLMs against them, while being far more hardline autoregressive themselves. It's just not consciously obvious.
I wonder whether we hold LLMs to a different standard because we have a long term reinforced expectation for a computer to produce an exact result?
One of my first teachers said to me that a computer won't ever output anything wrong, it will produce a result according to the instructions it was given.
LLMs do follow this principle as well, it's just that when we are assessing the quality of output we are incorrectly comparing it to the deterministic alternative, and this isn't really a valid comparison.
I think people tend to just not understand what autoregressive methods are capable of doing generally (i.e., basically anything an alternative method can do), and worse they sort of mentally view it as equivalent to a context length of 1.
One of my first teachers said to me that a computer won't ever output anything wrong, it will produce a result according to the instructions it was given.
LLMs do follow this principle as well, it's just that when we are assessing the quality of output we are incorrectly comparing it to the deterministic alternative, and this isn't really a valid comparison.