Hacker News new | ask | show | jobs
by CuriousSkeptic 548 days ago
Sentence thing being this one? https://ai.meta.com/research/publications/large-concept-mode...

I don’t get it, isn’t this concept modelling exactly whats going on in the deeper layers of current LLMs?

1 comments

Perhaps it does some similar grouping of content, but this more directly incentivizes longer term gripping of tokens into abstract concepts. I agree that it's not obvious this would perform better than letting the model build it's own structures for grouping tokens, but the proof is in the pudding; the technique led to improved results for a given model & training size. This newer approach gives the model the freedom to build it's own breakpoints, but still bakes the idea into the algorithm itself.

What it means is a harder question. Perhaps transformers are simply an inefficient computational structure for this process? Perhaps a more flexible computational structure would integrate this step more efficiently? Perhaps Transformers are efficient enough, but our learning/densifying isn't? Or perhaps it's such a core powerful step that it might as well be built into the algo regardless? Much to learn.