Hacker News new | ask | show | jobs
by phpnode 41 days ago
People always say this but it’s misguided imo. Yes LLMs are not deterministic, but that’s totally irrelevant. You aren’t executing the LLMs output directly, you’re using the LLM to produce an artefact once that is then executed deterministically. A spec gets turned into code once. Editing the spec can cause the code to be updated but it’s not recreating the whole program each time, so why does determinism matter?
6 comments

In my experience, I'm using LLMs as my abstraction to "junior engineer". A junior engineer isn't deterministic either. I find that if you treat the LLM output like a person's output, you're good. Or at least in my projects, it's been very successful. I don't have it generate more code than I can review, or if I give it a snippet to help me fix it, if it ends up re-writing it like an ambitious engineer would do, I tell it to start over and make minimal changes.

I guess I'm not spun up about the determinism because I've been working at the "treat it like a person" level more than the "treat it like a compiler" level.

To me, it's really like an engineer who knows the docs and had a good memory rather than infallable code generator.

I work at a small company, so we don't have tons of processes in place, but I imagine that if you already had huge "standards" docs that engineers need to follow, then giving the LLM those standards would make things even better.

The thing is you can quickly teach a Junior how to respect a specification contract, so that with very minimal oversight, you get the wanted implementation. And after a few years (or months), the communication overhead get shorter. What would have been multiple rounds of meetings and review sessions are a short email and one or two demos.
What I've been learning as a 20% "harness engineer" is that in order to get the models to "learn" you need to add both documentation and static checks, as well as often custom skills. My main project at work has issues where the AI will often get super confused and step on itself trying to run tests - so the answer is writing better docs (AGENTS.md) and providing deterministic tools to work with the projects.

Large software projects (I'm thinking google3) often have large amounts of both of those things, as they're always getting new developers joining.

If it's not deterministic you can never fully trust it. In a deterministic abstraction I don't need to audit the lower levels.
Who said you need to trust it? Reviewing code is still way faster than writing code.
> Reviewing code is still way faster than writing code.

Writing code results in a much better understanding of the code than reviewing it

In fact I would say that in large complex codebases, in order to develop the same understanding of what the code is doing might actually take longer than writing it from scratch would have

But it's written to your spec; there should be no surprises!
That's the fun part! The surprise is that it's actually not written to your spec at all! It just kinda smells similar to your spec
You fully trust your coworkers?
If you don't, you may want to find a different company to work for.
this is the way LLMs _should_ be used, as an assistant to create reliable, deterministic code. and honestly, they're fantastic when used this way. build the thing you need with the LLM, then put the LLM away.

but in practice, the current obsession with agents means people are creating applications that depend entirely on sending requests to LLMs for their core functionality. which means abandoning the whole idea of deterministic software in favor of just praying that all of the prompts you put around those API requests will lead to the right result.

try distributing this spec amongst your team members, ask each of them to drive it to completion. no follow up edits. deploy to individual environments and then run a rigorous test suite against all of the deployments. see if all of them behave the same way.
They won't. So what? This is not how specs are used, no one is saying that they are a replacement for source code.
Exactly, the argument makes sense if its about inference at runtime

But that's not the case here

how do you know the artifact is correct?