|
|
|
|
|
by lispitillo
323 days ago
|
|
I hope/fear this HRM model is going to be merged with MoE very soon. Given the huge economic pressure to develop powerful LLMs I think this can be done in just a month. The paper seems to only study problems like sudoku solving, and not question answering or other applications of LLMs. Furthermore they omit a section for future applications or fusion with current LLMs. I think anyone working in this field can envision their applications, but the details to have a MoE with an HRM model could be their next paper. I only skimmed the paper and I am not an expert, sure other will/can explain why they don't discuss such a new structure. Anyway, my post is just blissful ignorance over the complexity involved and the impossible task to predict change. Edit: A more general idea is that Mixture of Expert is related to cluster of concepts and now we would have to consider a cluster of concepts related by the time they take to be grasped, so in a sense the model would have in latent space an estimation of the depth, number of layers, and time required for each concept, just like we adapt our reading style for a dense math book different to a newspaper short story. |
|
In contrast, language modeling requires storing a large number of arbitrary phrases and their relation to each other, so I don't think you could ever get away with a similarly small model. Fortunately, a comparatively small number of steps typically seems to be enough to get decent results.
But if you tried to use an LLM-sized model in an HRM-style loop, it would be dog slow, so I don't expect anyone to try it anytime soon. Certainly not within a month.
Maybe you could have a hybrid where an LLM has a smaller HRM bolted on to solve the occasional constraint-satisfaction task.