Hacker News new | ask | show | jobs
by svnt 59 days ago
> The model does not need to be retrained. It needs surgical guardrails at the exact moments where its output layer flinches.

> With those guardrails — a calculator for arithmetic, a logic solver for formal puzzles, a per-requirement verifier for structural constraints, and a handful of regex post-passes — the projected score climbs to ~8.2.

Surgical guardrails? Tools, those are just tools.

2 comments

>It needs surgical guardrails at the exact moments where its output layer flinches.

This article is very clearly shitty LLM output. Abstract noun and verb combos are the tipoff.

It's actually quite horrible, it repeats lines from paragraph to paragraph.

I know that's one of the tells of AI-generated text, but if anything there's too much of it on this page. The article barely has any complete sentences. I think a human learned "sentence fragments == punchy" and then had too much fun writing at least some of this article.
My guess is they used the 2b model to write the article as a proof of concept. Which did not prove the concept.
clever guess but no lol. used claude for the writeup. the proof isn't the prose, it's the tape and the code. run it on your machine, you'll have a free private agent custom to whatever you need. that's the proof of concept.
It would be ironic if the article itself was written with Gemma2B.
I don't care anymore, if it happens to violate HN guidelines: Please, authors. Please write your own damn articles. We can absolutely tell that you're using Claude, I promise. (I mean, it might not be Claude specifically this time, but frankly I'd be willing to bet on it.) The AI writing is like nails on a chalkboard to me.
The worst part is the phrases don't actually mean anything. It's the LLM equivalent of flowery prose. The author admitted below that the article was Claude. So there you go.
"Surgical "is the kind of wordage that LLMs seem to love to output. I have had to put in my .md file the explicit statement that the word "surgical" should only be used when referring to an actual operation at the block...
you're right, they are tools. that's kind of the point. PAL is a subprocess that runs a python expression. Z3 is a constraint solver. regex is regex. calling them "surgical" is just about when they fire, not what they are. the model generates correctly 90%+ of the time. the guardrails only trigger on the 7 specific patterns we found in the tape. to be clear, the ~8.0 score is the raw model with zero augmentation. no tools, no tricks. just the naive wrapper. the guardrail projections are documented separately. all the code is in the article for anyone who wants to review it.
The core issue is that the LLM is using rhetoric to try to convince or persuade you. That's what you need to tell it not to do.
Which will not work. Don't think of a pink genitalia, I mean elephant...
An LLM that can't follow instructions wouldn't be able to write code anyway.
Nonsense. But even an LLM that can follow instructions cannot follow that one.
What is intrinsic to an LLM or its training that would prevent it from following the directive that it should not try to convince you of something?