Hacker News new | ask | show | jobs
by utensil4778 717 days ago
A hard restriction would prevent a malicious prompt from being passed to the model. Instead, it seems they've simply asked the model nicely to pretty please not answer malicious prompts.

A hard restriction would be a regex or a simpler model checking your prompt for known or suspected bad prompts and refusing outright.

1 comments

You seem to be suggesting implementing natural language processing as a series of regexes.

If NLP was that easy, we wouldn't have needed to invent transformer models, and we'd have had things as capable as ChatGPT about the same time that Microsoft was selling Encarta on CD.

The reality is, this soft fuzzy thing is the only practical way to minimise the Scunthorpe problem (and its equivalents for false negatives): https://en.wikipedia.org/wiki/Scunthorpe_problem