|
|
|
|
|
by mirekrusin
474 days ago
|
|
You're explaining it nicely and then seem to make mistake that contradicts what you've just said – because code and text share domain (text based) – large, generic models will always out-compete smaller, specialized ones – that's the lesson. If you'd compare it with ie. model for self driving cars – generic text models will not win because they operate in different domain. In all cases trying to optimize on subset/specialized tasks within domain is not worth the investment because state of art will be held by larger models working on the whole available set. |
|
> You're explaining it nicely and then seem to make mistake that contradicts what you've just said – because code and text share domain (text based)
"Text" is not the domain that matters.
The whole trick behind LLMs being as capable as they are, is that they're able to tease out concepts from all that training text - concepts of any kind, from things to ideas to patterns of thinking. The latent space of those models has enough dimensions to encode just about any semantic relationship as some distinct direction, and practice shows this is exactly what happens. That's what makes style transfer pretty much a vector operation (instead of "-King +Woman", think "-Academic, +Funny"), why LLMs are so good at translating between languages, from spec to code, and why adding modalities worked so well.
With LLMs, the common domain between "text" and "code" is not "text", but the way humans think, and the way they understand reality. It's not the raw sequences of tokens that map between, say, poetry or academic texts and code - it's the patterns of thought behind those sequences of tokens.
Code is a specific domain - beyond being the lifeblood of programs, it's also an exercise in a specific way of thinking, taken up to 11. That's why learning code turned out to be crucial for improving general reasoning abilities of LLMs (the same is, IMO, true for humans, but it's harder to demonstrate a solid proof). And conversely, text in general provides context for code that would be hard to infer from code alone.