|
|
|
|
|
by cjonas
1131 days ago
|
|
ya that's a good point... I guess if the "moderation" layer returns a constrained output (like "ALLOW") and anything not an exact match is considered a failure, then any prompt that can trick the first layer, probably wouldn't have the flexibility to do much else on the subsequent layers (unless maybe you could craft some clever conditional statement to target each layer independently?). |
|
It can be made to, and I think I stumbled upon a core insight that makes simple format coercion reproducible without fine-tuning or logit shenanigans, so yeah, this allows you to both reduce false positives and constrain failures to false positives or to task boundaries.
There’s also RHLF-derived coercion which is hilarious. [2]
[0] https://github.com/1rgs/jsonformer
[1] https://news.ycombinator.com/item?id=35790092
[2] https://twitter.com/goodside/status/1657396491676164096