Hacker News new | ask | show | jobs
by pigscantfly 1303 days ago
No, an autoregressive language model is conditioned on all prior states, not the previous one.
1 comments

Multiply out the states, "all prior states" is then the "previous one". Easy to model as Markov chain.
Also 'easy' to model as a lookup table containing all possible solutions.
this is technically true but the Markov chain would be too big to store even with petabytes of storage.
Indeed. The argument boils down to: since it's finite, I can turn it into a FSA. Not only is that unhelpful, it doesn't tell you how to construct it, i.e. the learning process.