| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by klodolph 2 days ago
	On the off chance that you’re serious, that would result in disastrously bad output. The difference between “jmp $+15” and “jmp $+16” is inscrutable and the LLM would not be able to pick the right one without tooling. That tooling is a compiler. The higher level, the better chance the LLM can be steered to good output. Machine code is hopeless, don’t bother.

4 comments

torginus 2 days ago

> The difference between “jmp $+15” and “jmp $+16” is inscrutable

Just like the difference between 'him' and 'her' is inscrutable taken out of context, but that's why LLMs have embeddings they use to store contextual information in huge vectors and have an input processing phase during which the input tokens gain contextual information, so that the LLM knows that 'him' refers to 'Peter' and 'her' refers to 'Jane'. Likewise it will be able to infer that $+15 is the 'success' branch of control flow and $+16 is the fail branch.

The way computer programs and natural language differ, is that in language, words with absolute or at least very constrained meanings are common, while code, is basically a pure manipulation of symbols, with variable and function names being meaningless helpers, and the actual meaning needs to be deduced from the way these symbols are manipulated.

In fact, I think LLMs are actually surprisingly good at this kind of abstract symbol manipulation, and are far less bothered than humans with 'add rax, rcx' by the fact that the meaning of 'rax' and 'rcx' are heavily contextual, as they dedicate a lot of time to build up rich contextual information that might be different in every place these symbols appear.

klodolph 2 days ago

> Just like the difference between 'him' and 'her' is inscrutable taken out of context,

The context is pretty flexible, like "Do you know Jim? I saw him at the store." Or, "Do you know Jim? Fifteen days ago, I saw him at the store." There’s a relatively small universe of pronouns (him, her, that, who, etc) and the pronouns refer to a token nearby (in this case, Jim).

With machine code, there’s a massive set of jump offsets, and the referent isn’t a token, but rather a location to start processing.

> In fact, I think LLMs are actually surprisingly good at this kind of abstract symbol manipulation,

When you’re manipulating machine code, you’ve stepped away from abstract symbol manipulation and you’re just manipulating byte values now.

I don’t think your argument here is convincing. Maybe you can point to a demo or some architecture where this works. But my sense is this—once you start designing a harness to make LLMs capable of writing machine code, or designing an architecture for LLMs to write machine code, something in your implementation probably looks like an assembler, and something in your internal tokenization of the machine code probably looks like a higher-level language.

pjmlp 2 days ago

That compiler does wonders with languages that have UB on their specs, especially when having optimizations passes with heuristics.

Also there are dynamic compilers were the shape of machine code changes as the code executes, and each single execution will certainly generate different sequences, depending on the program execution and where it is running.

Deterministic JIT compiler code generation, at least on optimising ones, is not a solved problem.

faangguyindia 2 days ago

What about AOT optimization? whuch brings aot closer to JITs performance? Isn't that something LLM + Harness can easily do?

klodolph 2 days ago

I think the idea that AOT is inherently faster than JIT, or vice versa, is a thoroughly debunked idea.

You can have LLMs help you optimize code but I don’t think you can do this unattended for non-trivial code.

jenadine 2 days ago

> The difference between “jmp $+15” and “jmp $+16” is inscrutable

I don't see why that's the case. LLM trained on binary would totally see it, not?

Also the tool can also be running the test and a debugger.

klodolph 2 days ago

> I don't see why that's the case. LLM trained on binary would totally see it, not?

It would not. You find the correct version by counting the number of bytes to the destination. LLMs are famously bad at this kind of problem (counting).

> Also the tool can also be running the test and a debugger.

The test needs to provide a good amount of signal. That’s too hard if you are throwing machine code at the wall.

In order for debuggers to work, you need some kind of model that describes what the code should do and what state the computer should be in after each instruction. That model is high-level code.

I can understand the intuitive appeal of training LLMs with machine code, but all of my experience with LLMs suggest that they are incredibly ill-suited to the task, and we just don’t have the capacity to train them to make useful machine code.

zx8080 2 days ago

Can "LLMs are bad at counting" be generalized to "LLM are better in complex stuff but make more mistakes in simple"?

fluoridation 2 days ago

I would phrase it as "LLMs are good at big picture stuff and bad at fine detail", or to put it another way, they're accurate, but imprecise and with low reproducibility.

bregma 2 days ago

It is my experience that it's the opposite. LLMs are very very precise but wildly inaccurate. They might give you 17 significant digits but be off by 10 orders of magnitude, to use a metaphor.

fluoridation 2 days ago

Sounds like we're in agreement, then. The 7 digits it got correct are the big picture, and the rest are the details. Are you disagreeing with my statement or with my usage of "accurate" and "precise"?

benj111 2 days ago

But where does that leave us when programmers treat themselves as architects with the AI doing the drudge work? As seems to be the fashion.

It then means you have 2 parties focussing on the big picture and no one focussing on the details.

fluoridation 2 days ago

I said "big picture stuff", but I guess I should have said "broad strokes". The truly correct answer is probably similar to what the model will answer, and if your problem is such that it can work with small imperfections in a solution, then the LLM helps. If the solution needs to be exactly right, then it will probably fail.

Yesterday on a whim I tried asking a local model a question about kanji that look different in different fonts despite being the same character (to the point of strokes appearing in completely different directions), and the model hallucinated imgur links to images of the characters. If imgur could work with approximate references to data maybe that would have worked.

ozlikethewizard 2 days ago

Its more LLMs are better at vague problems with multiple non perfect solutions, and struggle at problems that require precision.

klodolph 2 days ago

No, I don’t think so. LLMs are good at a lot of simple tasks, but bad at certain simple tasks. Moravec’s paradox in a new iteration.

It applies to humans too. Calculus is “simple” but it takes something like sixteen years to train a human to do it, if all goes well. Meanwhile, most humans think that inverse kinematics is, like, the easiest thing in the world (it’s a super complicated task).

fluoridation 2 days ago

Calculus is definitely the harder task, considering it took a species developing the cognitive capacity for symbolic reasoning for it to show up, whereas any animal can figure out how to position its limbs. Yeah, we figured out how to make CAS programs before inverse kinematics software, but that's because computers were made to solve numerical problems, not to replace the cerebella of chordates.

klodolph 2 days ago

> Calculus is definitely the harder task,

You’re only evaluating “harder” or “easier” based on the perspective of somebody who has a mammalian brain with millions of years of selective pressure to make it suitable for solving inverse kinematics problems.

The point here is that when we start constructing agents or tools with different architectures to ourselves, it makes sense to reevaluate notions of whether something is ‘hard’ or ‘easy’. LLMs are bad at counting not because counting is hard, but because their architecture makes it hard.

dezgeg 2 days ago

Even if it could, it would be ridiculously token inefficient to update huge amount of addresses instead when some small change is done to the middle of a binary