Hacker News new | ask | show | jobs
by manuelabeledo 131 days ago
Even those are way more predictable than LLMs, given the same input. But more importantly, LLMs aren’t stateless across executions, which is a huge no-no.
2 comments

> But more importantly, LLMs aren’t stateless across executions, which is a huge no-no.

They are, actually. A "fresh chat" with an LLM is non-deterministic but also stateless. Of course agentic workflows add memory, possibly RAG etc. but that memory is stored somewhere in plain English; you can just go and look at it. It may not be stateless but the state is fully known.

Using the managed runtime analogy, what you are saying is that, if I wanted to benchmark LLMs like I would do with runtimes, I would need to take the delta between versions, plus that between whatever memory they may have. I don’t see how that helps with reproducibility.

Perhaps more importantly, how would I quantify such “memory”? In other words, how could I verify that two memory inputs are the same, and how could I formalize the entirety of such inputs with the same outputs?

Are you certain to predict the JIT generated machine code given the JVM bytecode?

Without taking anything else into account that the JIT uses on its decision tree?

For a single execution, to a certain extent, yes.

But that’s not the point I’m trying to make here. JIT compilers are vastly more predictable than LLMs. I can take any two JVMs from any two vendors, and over several versions and years, I’m confident that they will produce the same outputs given the same inputs, to a certain degree, where the input is not only code but GC, libraries, etc.

I cannot do the same with two versions of the same LLM offering from a single vendor, that had been released one year apart.

Good luck mapping OpenJDK with Azul's cloud JIT, in generated machine code.
The output being the actual program output, not the byte code. No one is arguing that in the scope of LLMs.
Enough so that I've never had a runtime issue because the compiler did something odd once, and correct thr next time. At least in c#. If Java is doing that, then stop using it...

If the compiler had an issue like LLMs do, the half my builds would be broken, running the same source.