| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dingnuts 359 days ago
	If LLMs are the new compilers, enabling software to be built with natural language, why can't LLMs just generate bytecode directly? Why generate HLL code at all?

3 comments

akavi 359 days ago

Same reason humans use high-level languages: limited context windows.

Both humans and LLMs benefit from non-leaky abstractions—they offload low-level details and free up mental or computational bandwidth for higher-order concerns. When, say, implementing a permissioning system for a web app, I can't simultaneously track memory allocation and how my data model choices aligns with product goals. Abstractions let me ignore the former to "spend" my limited intelligence on the latter; same with LLMs and their context limits.

Yes, more intelligence (at least in part) means being able to handle larger contexts, and maybe superintelligent systems could keep everything "in mind." But even then, abstraction likely remains useful in trading depth for surface area. Chris Sawyer was brilliant enough to write Rollercoaster Tycoon in assembly, but probably wouldn't be able to do the same for Elden Ring.

(Also, at least until LLMs are so transcendentally intelligent they outstrip our ability to understand their actions, HLLs are much more verifiable by humans than assembly is. Admittedly, this might be a time-limited concern)

link

Uehreka 359 days ago

Why would the ability to generate source code imply the ability to generate bytecode? Also you wouldn’t want that, humans can’t review bytecode. I think you may be taking the metaphor too literally.

link

skydhash 359 days ago

Because the semantic for each term in a programming language is pretty much a 1:1 relation to a sequential and logic-based ordering of terms in bytecode (which are still code).

> Also you wouldn’t want that, humans can’t review bytecode

The one great thing about automation (and formalism) is that you don't have to continuously review it. You vet it once, then you add another mechanism that monitors for wrong output/behavior. And now, the human is free for something else.

link

pixl97 359 days ago

I dont think they are... LLMs can learn from anything thats been tokenized. Feed enough decompiled and labeled data with the bytecode and it's likely the machine will be able to dump out an executable. I wouldn't be surprised if an llm could output a valid elf right now other than the tokens may have been stripped in pretraining.

link

bird0861 359 days ago

https://ai.meta.com/research/publications/meta-large-languag...

link

VinLucero 359 days ago

I agree here. English (human language) to Bytecode is the future.

With reverse translation as needed.

link

thfuran 359 days ago

English is a pretty terrible language for describing the precise behavior of a program.

link

demirbey05 359 days ago

How will you figure out or solve hallucinated assembly code ?

link

imiric 358 days ago

The vibe coders would tell you: you don't. You test the program, or ask the LLM to write tests for you, and if there are any issues, you ask it to fix them. And you do that in a loop until there are no more issues.

I imagine that at some point they must wonder what their role is, and why the LLM couldn't do all of that independently.

link