No, an autoregressive language model is conditioned on all prior states, not the...

blueflow · on Nov 17, 2022

Multiply out the states, "all prior states" is then the "previous one". Easy to model as Markov chain.

The_Amp_Walrus · on Nov 17, 2022

Also 'easy' to model as a lookup table containing all possible solutions.

adgjlsfhk1 · on Nov 17, 2022

this is technically true but the Markov chain would be too big to store even with petabytes of storage.

tgv · on Nov 17, 2022

Indeed. The argument boils down to: since it's finite, I can turn it into a FSA. Not only is that unhelpful, it doesn't tell you how to construct it, i.e. the learning process.