|
|
|
|
|
by knexer
1137 days ago
|
|
I like the first-order vs second-order distinction here - this is a clean way to describe something that I've often found hard to communicate to others, at least for those familiar with functional programming. Everyone's familiar with first-order use of a language model at this point (it's just plain chatgpt) but higher-order use seems much more difficult for most to even conceptualize, much less grasp the implications of. The huge challenge with higher-order use of LLMs is that higher-order constructs are inherently more chaotic - the inconsistency and unreliability of an LLM compound exponentially when it's used recursively. Just look at how hard it is to keep AutoGPT from going off the rails. Any higher-order application of LLMs needs to contend with this, and that requires building in redundancy, feedback loops, quality checking, and other things that programmers just aren't used to needing. More powerful models and better alignment techniques will help, but at the end of the day it's a fundamentally different engineering paradigm. We've been spoiled by the extreme consistency and reliability of traditional programming constructs; I suspect higher-order LLM use might be easier to think about in terms of human organizations, or distributed systems, or perhaps even biology, where we don't have this guarantee of a ~100% consistent atom that can be composed. Half-baked aside: in some ways this seems like a generalization of Conway's law (organizations create software objects that mirror their own structure), where now we have some third player that's a middle ground between humans and software. It's unclear how this third player will fit in - one could envision many different structures, and it's unclear which are feasible and which would be effective. Exciting times! |
|
That does speak to the increase you can get by orchestrating things more with multiple runs even in something as simple as take he majority. I'm assuming the multiple choice stuff allowed it to think in a scratch pad before answering or something as just taking multiple runs of a single next character A B C D for multiple choice would probably be similar to just lowering the temperature and taking one measurement.