| Author here - thanks for engaging. > One way to think of the bitter lesson as it applies to generative models is that ~all data carries some information about the structure of reality Completely agree. It might have not come across, but what I'm pointing out in the post is that the data as it is currently encoded in the models is needlessly lossy. Tokens do not reveal all the information we have at our disposal.
In natural language, that's fine, because it's quite loose in structure. But if our domain is heavily structured (like modern programming languages are), why reveal only snippets of linearised syntax of that structure to the model? Why not reveal the full structure we have at our disposal? > and architectures that let you train on more data are better because they learn better underlying world models. By this argument, wouldn't we conclude that training on chess using the game structure wouldn't work either, since that'd be a model that uses less data? Less data is the point, isn't it? |