If LLMs are the new compilers, enabling software to be built with natural language, why can't LLMs just generate bytecode directly? Why generate HLL code at all?
Same reason humans use high-level languages: limited context windows.
Both humans and LLMs benefit from non-leaky abstractions—they offload low-level details and free up mental or computational bandwidth for higher-order concerns. When, say, implementing a permissioning system for a web app, I can't simultaneously track memory allocation and how my data model choices aligns with product goals. Abstractions let me ignore the former to "spend" my limited intelligence on the latter; same with LLMs and their context limits.
Yes, more intelligence (at least in part) means being able to handle larger contexts, and maybe superintelligent systems could keep everything "in mind." But even then, abstraction likely remains useful in trading depth for surface area. Chris Sawyer was brilliant enough to write Rollercoaster Tycoon in assembly, but probably wouldn't be able to do the same for Elden Ring.
(Also, at least until LLMs are so transcendentally intelligent they outstrip our ability to understand their actions, HLLs are much more verifiable by humans than assembly is. Admittedly, this might be a time-limited concern)
Why would the ability to generate source code imply the ability to generate bytecode? Also you wouldn’t want that, humans can’t review bytecode. I think you may be taking the metaphor too literally.
Because the semantic for each term in a programming language is pretty much a 1:1 relation to a sequential and logic-based ordering of terms in bytecode (which are still code).
> Also you wouldn’t want that, humans can’t review bytecode
The one great thing about automation (and formalism) is that you don't have to continuously review it. You vet it once, then you add another mechanism that monitors for wrong output/behavior. And now, the human is free for something else.
I dont think they are... LLMs can learn from anything thats been tokenized. Feed enough decompiled and labeled data with the bytecode and it's likely the machine will be able to dump out an executable. I wouldn't be surprised if an llm could output a valid elf right now other than the tokens may have been stripped in pretraining.
The vibe coders would tell you: you don't. You test the program, or ask the LLM to write tests for you, and if there are any issues, you ask it to fix them. And you do that in a loop until there are no more issues.
I imagine that at some point they must wonder what their role is, and why the LLM couldn't do all of that independently.
Both humans and LLMs benefit from non-leaky abstractions—they offload low-level details and free up mental or computational bandwidth for higher-order concerns. When, say, implementing a permissioning system for a web app, I can't simultaneously track memory allocation and how my data model choices aligns with product goals. Abstractions let me ignore the former to "spend" my limited intelligence on the latter; same with LLMs and their context limits.
Yes, more intelligence (at least in part) means being able to handle larger contexts, and maybe superintelligent systems could keep everything "in mind." But even then, abstraction likely remains useful in trading depth for surface area. Chris Sawyer was brilliant enough to write Rollercoaster Tycoon in assembly, but probably wouldn't be able to do the same for Elden Ring.
(Also, at least until LLMs are so transcendentally intelligent they outstrip our ability to understand their actions, HLLs are much more verifiable by humans than assembly is. Admittedly, this might be a time-limited concern)