Hacker News new | ask | show | jobs
by O_nlogn 985 days ago
chat GPT has the same behaviour, no? I've had it send most or all of a response before the censor system triggers it to be redacted.
2 comments

ChatGPT's web interface has two, one is triggered by a moderation endpoint API call which scolds you and another one is hardcoded as a regex type filter for copyright which forcibly closes the pipe from the LLM instantly and doesn't acknowledge that something happened. It's hardcoded because a translation to another language or a typo inserted into the output avoids it.

You can get this (or at least could) by asking for the opening of tale of two cities (a public domain work!)

The API (at least via playground) now also has scolding built in, which triggers sometimes when you're just playing around with settings like high temp, because the model can devolve into a mess of all sorts of nonsense text, as is teh nature of transformers, but it doesn't censor it.

Anyone know how the API deals with this?

Does it send a response, then a follow-up payload with an "ohshit plz delete that" message?

The funny thing is that the "plz delete" messages have to be executed by the browser javascript. So in theory, you should be able to capture the "deleted" messages by keeping the network tab open or recording the traffic, right?

Edit: Last time I checked, ChatGPTs web interface was using server-sent events to stream the response words. The events were clearly visible in the network tab if you opened it early enough. So if it sends "delete" messages, they should show up in there.