Hacker News new | ask | show | jobs
by svara 311 days ago
This is a perfectly fine line of argument imo but the GP didn't say that.

LLM research is trying out a lot of different things that move away from just training on next token prediction, and I buy the argument that not doing anything else would be limiting.

The model is still fundamentally a next token predictor.