Hacker News new | ask | show | jobs
by mota7 1202 days ago
The paper says "... optimized on next-word prediction only". Which is absolutely correct in 2023.

ChatGPT (and indeed all recent LLMs) using much more complex training methods than simply 'next-word prediction'.

1 comments

This passage makes two claims

* one, applicable to current language models (which ChatGPT is one of them), claim that they "they fail to capture several syntactic constructs and semantics properties" and "their linguistic understanding is superficial". It gives an example, "they tend to incorrectly assign the verb to the subject in nested phrases like ‘the keys that the man holds ARE here", which is not the kind of mistake that ChatGPT makes.

* Another claim, is that "when text generation is optimized on next-word prediction only" then "deep language models generate bland, incoherent sequences or get stuck in repetitive loops". Only this second claim is relative to next-word prediction.

Yeah, that struck me too. I followed one of the refs at random and it was to a 2020 paper about RNNs.