| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dontupvoteme 1097 days ago

OpenAI themselves ha(d/s) an over-the-top filter to "prevent copyright issues" which prevents it from reciting the litany or "it was the best of times.."

Why not have at a minimum a strict blacklist of which words you do not permit in the output - Kill the model immediately if it has it and flag user for review (After some smoke testing you can have a non-connected GPT instance evaluate it before it wastes a persons time, but if there's one thing I've learned from these early days of LLMs, it's that you do NOT want the general denizens of the internet to have access to it through you. OpenAI had to update their terms of service when they saw what they were getting requests for.)

A better solution solution might be more along the lines of a restricted whitelist of words that either the model itself, or the model + NLP, or model + NLP + another model etc cajoles into being both not useless and guaranteed to include not a single word you didn't intend. I guess you could call it CorpusCoercion

I would consider this mandatory for e.g. generating any content for children. The equivalent for lawyers is to whitelist in the actual correct legal precedents and their names so it can't make them up :)

LLM induced Laziness and greed are already here and will only get worse, build your kill switches and interlocks while you can on what you can.

Also GPT will often happily generate python code that will run for hours, and then suddenly you realize that the kernel is about to invoke oomkiller in a minute. Even without malicious intent you can get some interesting garbage out of webchat gpt3 models - though "build me an analysis tree of this drive" is probably a mild risk without some containerization.

I would also bet decent money the privilege escalation prompt was in part (maybe a large one) the result of openai making gpt3 cheaper and worse, they probably saw the ability to save compute by using what you provided (this is the only way to get half decent code out of it..). I would be very surprised if gpt4 (the unmodified one via API) falls for it.

</rant>