|
|
|
|
|
by gwern
703 days ago
|
|
An entertaining outcome here is that LLMs may render the docs vs code debate largely moot: as LLM coding capabilities increase and the cost per token plummets, it becomes increasingly possible to simply stop writing code at all, and instead write docs which are 'compiled' each time by a LLM to code which is then compiled normally and the code thrown away. The code can never get out of sync with the docs because it is always generated from the docs, in a way that previous brittle fragile complicated 'generate code from docs' approaches could only vaguely dream of. To do bug fixes, one simply updates the docs to explain the new behavior and intentions, and perhaps include an example (ie. unit test) or a property. This is then reflected in the new version of the codebase - the codebase as a whole, not simply one function or module. So the global refactoring or rewrites happen automatically, simply from conditioning on the new docs as a whole. This might sound breathtaking inefficient and expensive, but it's just the next step in the long progression from the raw machine ops to assembler to low-level languages like C or LLVM to high-level languages to docs/specifications... I'm sure at each step, the masters of the lower stage were horrified by the profligacy and waste of just throwing away the lower stage each time and redoing everything from scratch. |
|
https://www.efsa.europa.eu/sites/default/files/event/180918-...
Which is to say that with an iterative, human-computer interaction (HCI), that is back-ended by a GPT (API) algorithm which can learn from the conversation and perhaps be enhanced by RAG (retrieval augmented generation) of code AND documentation (AKA prompt engineering), results beyond the average intern-engineer pair are not easily achievable, but increasingly probable given how both humans and computers are learning iteratively as we interact with emergent technology.
The key is that realizing that the computer can generate code, but that code is going to be frequently bad, if not hallucinatory in its compilability and perhaps computability and therefore, the human MUST play a DevOps or SRE or tech writer role pairing with the computer to produce better code, faster and cheaper.
Subtract either the computer or the human and you wind up with the same old, same old. I think what we want is GPT-backed metaprogramming produce white box tests precisely because it can see into the design and prove the code works before the code is shared with the human.
I don't know about you, but I'd trust AI a lot further if anything it generated was provable BEFORE it reached my cursor, not after.
The same is true here today.
Why doesn't every GPT interaction on the planet, when it generates code, simply generate white box tests proving that the code "works" and produces "expected results" to reach consensus with the human in its "pairing"?
I'm still guessing. I've posed this question to every team I've interacted with since this emerged, which includes many names you'd recognize.
Not trivial, but increasingly straightforward given the tools and the talent.