Hacker News new | ask | show | jobs
by aliston 1117 days ago
Is it true that the safeguards are considered part of the model? I had assumed that the "safeguards" that limit certain types of responses in ChatGPT were separate from the actual language model.
3 comments

My understanding is that most of the safeguards are inside the model, but there are some safeguards that are outside the model. In particular if you ask the API to generate copyrighted data it will, but then the connection will mysteriously break after the first few words which I assume is a separate system watching the responses.
It seems to me that there’s lots of room to change stuff that profoundly affects the range of responses without altering the base model. The prompt template alone seems like a place outside the model where we’ve seen safeguards get implemented, and other stuff that affects the usefulness of a model’s responses.
I'm of the same mind, and I believe they're keeping their ear to the ground to address any jailbreaks and loopholes. I have tricked GPT-4 into spitting out text it shouldn't have, only to have it dance around the same prompts less than 24 hours later. This "the model is the same" response seems like a deliberate deflection meant to mask the mechanical turk that ChatGPT is becoming.
ChatGPT is not the same thing as the underlying model. ChatGPT is just a UI over the model. The tweet was about the model.
ChatGPT is also the fine tuning and prompting of the model. It’s a distinct set of weights from “raw” GPT-4/etc, it’s just not a foundational model.
No, ChatGPT isn't a model. gpt-4 is a model, gpt-3.5-turbo is a model, text-davinci-003 is a model. ChatGPT is a user interface.

It has a very basic prompt on top of the existing models. There is no additional fine tuning involved.