| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by svnt 106 days ago

> The model does not need to be retrained. It needs surgical guardrails at the exact moments where its output layer flinches.

> With those guardrails — a calculator for arithmetic, a logic solver for formal puzzles, a per-requirement verifier for structural constraints, and a handful of regex post-passes — the projected score climbs to ~8.2.

Surgical guardrails? Tools, those are just tools.

2 comments

operatingthetan 106 days ago

>It needs surgical guardrails at the exact moments where its output layer flinches.

This article is very clearly shitty LLM output. Abstract noun and verb combos are the tipoff.

It's actually quite horrible, it repeats lines from paragraph to paragraph.

link

smallerize 106 days ago

I know that's one of the tells of AI-generated text, but if anything there's too much of it on this page. The article barely has any complete sentences. I think a human learned "sentence fragments == punchy" and then had too much fun writing at least some of this article.

link

operatingthetan 106 days ago

My guess is they used the 2b model to write the article as a proof of concept. Which did not prove the concept.

link

fredmendoza 106 days ago

clever guess but no lol. used claude for the writeup. the proof isn't the prose, it's the tape and the code. run it on your machine, you'll have a free private agent custom to whatever you need. that's the proof of concept.

link

evanjrowley 101 days ago

It would be ironic if the article itself was written with Gemma2B.

link

jchw 106 days ago

I don't care anymore, if it happens to violate HN guidelines: Please, authors. Please write your own damn articles. We can absolutely tell that you're using Claude, I promise. (I mean, it might not be Claude specifically this time, but frankly I'd be willing to bet on it.) The AI writing is like nails on a chalkboard to me.

link

operatingthetan 106 days ago

The worst part is the phrases don't actually mean anything. It's the LLM equivalent of flowery prose. The author admitted below that the article was Claude. So there you go.

link

polotics 106 days ago

"Surgical "is the kind of wordage that LLMs seem to love to output. I have had to put in my .md file the explicit statement that the word "surgical" should only be used when referring to an actual operation at the block...

link

fredmendoza 106 days ago

you're right, they are tools. that's kind of the point. PAL is a subprocess that runs a python expression. Z3 is a constraint solver. regex is regex. calling them "surgical" is just about when they fire, not what they are. the model generates correctly 90%+ of the time. the guardrails only trigger on the 7 specific patterns we found in the tape. to be clear, the ~8.0 score is the raw model with zero augmentation. no tools, no tricks. just the naive wrapper. the guardrail projections are documented separately. all the code is in the article for anyone who wants to review it.

link

mrtesthah 106 days ago

The core issue is that the LLM is using rhetoric to try to convince or persuade you. That's what you need to tell it not to do.

link

throwanem 106 days ago

Which will not work. Don't think of a pink genitalia, I mean elephant...

link

mrtesthah 101 days ago

An LLM that can't follow instructions wouldn't be able to write code anyway.

link

throwanem 100 days ago

Nonsense. But even an LLM that can follow instructions cannot follow that one.

link

mrtesthah 100 days ago

What is intrinsic to an LLM or its training that would prevent it from following the directive that it should not try to convince you of something?

link