| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by aliston 1117 days ago
	Is it true that the safeguards are considered part of the model? I had assumed that the "safeguards" that limit certain types of responses in ChatGPT were separate from the actual language model.

3 comments

kmod 1117 days ago

My understanding is that most of the safeguards are inside the model, but there are some safeguards that are outside the model. In particular if you ask the API to generate copyrighted data it will, but then the connection will mysteriously break after the first few words which I assume is a separate system watching the responses.

link

helpfulclippy 1117 days ago

It seems to me that there’s lots of room to change stuff that profoundly affects the range of responses without altering the base model. The prompt template alone seems like a place outside the model where we’ve seen safeguards get implemented, and other stuff that affects the usefulness of a model’s responses.

link

z3c0 1117 days ago

I'm of the same mind, and I believe they're keeping their ear to the ground to address any jailbreaks and loopholes. I have tricked GPT-4 into spitting out text it shouldn't have, only to have it dance around the same prompts less than 24 hours later. This "the model is the same" response seems like a deliberate deflection meant to mask the mechanical turk that ChatGPT is becoming.

link

weird-eye-issue 1116 days ago

ChatGPT is not the same thing as the underlying model. ChatGPT is just a UI over the model. The tweet was about the model.

link

danpalmer 1116 days ago

ChatGPT is also the fine tuning and prompting of the model. It’s a distinct set of weights from “raw” GPT-4/etc, it’s just not a foundational model.

link

weird-eye-issue 1116 days ago

No, ChatGPT isn't a model. gpt-4 is a model, gpt-3.5-turbo is a model, text-davinci-003 is a model. ChatGPT is a user interface.

It has a very basic prompt on top of the existing models. There is no additional fine tuning involved.

link