| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by manquer 383 days ago

There are plenty out of band(non prompt) controls , it just requires more effort than system prompts.

You can control what goes into the training data set[1],that is how you label the data, what your workload with the likes of Scale AI is.

You can also adjust what kind of self supervised learning methods and biases are there and how they impact the model.

On a pre trained model there are plenty of fine tuning options where transfer learning approaches can be applied, distilling for LoRA all do some versions of these.

Even if not as large as xAI with hundreds of thousands of GPUs available to train/fine tune we can still do some inference time strategies like tuned embeddings or use guardrails and so on .

[1] Perhaps you could have a model only trained on child safe content alone (with synthetic data if natural data is not enough) Disney or Apple would be super interested in something like that I imagine .

1 comments

semiquaver 383 days ago

All the non prompt controls you mentioned have _nothing like_ the level of actual influence that a system prompt can have. They’re not a substitute in the same way that (say) bound query parameters are a substitute for interpolated SQL text.

link

manquer 383 days ago

Guardrails are a rough analogue to binding parameters in SQL perhaps.

These methods do work better than prompting. For example Prompting alone for example has much poor reliability in spitting out JSON output adhering to a schema consistently. OpenAI cited 40% for prompts versus 100% reliablity with their fine-tuning for structured outputs [1].

Content moderation is more of course challenging and more nebulous. Justice Porter famously defined the legal test for hard core pornographic content as "I will know it when I see it" [Jacobellis v. Ohio | 378 U.S. 184 (1964)].

It is more difficult for a model marketed as lightly moderated like Grok.

However that doesn't mean the other methods don't work or are not being used at all.

[1] https://openai.com/index/introducing-structured-outputs-in-t...

[2] https://en.wikipedia.org/wiki/Jacobellis_v._Ohio

link

simonw 383 days ago

The structured data JSON output thing is a special case: it works by interacting directly with the "select next token" mechanism, restricting the LLM to only picking from a token that would be valid given the specified schema.

This makes invalid output (as far as the JSON schema goes) impossible, with one exception: if the model runs out of output tokens the output could be an incomplete JSON object.

Most of the other things that people call "guardrails" offer far weaker protection - they tend to use additional models which can often be tricked in other ways.

link

gerardatkonvo 369 days ago

Do you have any sources? Is it the same thing for tool calling parameters?

link

manquer 383 days ago

You are right of course.

I didn't mean to imply that all methods give 100% reliability as the structured data does. My point was just that there are non system prompt approaches which give on par or better reliability and/or injection security, it is not just system prompt or bust as other posters suggest.

link