| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by gck1 6 days ago

Yes, that's true. Excluding Fable, OAI models are the most refusal heavy. However, I'd rather get a refusal than response with poisoned output.

Since currently there's no way to verify if poisoning happened or not, I don't trust Anthropic anymore, regardless of what they say.

But my trust towards OAI is also brittle - what if they also do it, or start doing it?

I want to have a verifiable way to know that the prompt I sent was the prompt the model received. I want to know if anything was injected as well - I understand they may not necessarily be able to reveal the exact steering, but at least give me the steering category and its hash or something.

1 comments

dannyw 6 days ago

What kind of work are you getting refusals on? Genuinely curious. The only refusal I’ve had in recent memory was declining to find doorbell camera footage matching a certain description, which is fair enough and I think EU laws heavily restrict such activities (even tho I’m not in the EU)

link

VortexLain 5 days ago

During Iran shutdowns I've been researching what ways Iranians manage to get to the internet by mimicking as whitelisted resources (such as hcapcha). ChatGPT had refused to lookup information written in Farsi since "circumventing state regulation is a crime".

link

Cider9986 5 days ago

How would the AI be able to find the footage itself?

link

dannyw 5 days ago

I use Codex and wanted it to sort through the footage and use subagents to review. Codex limits are fairly generous, esp paired with mini models for this kind of task generally, but even GPT5.5 usage is still pretty generous.

Again, it’s the only refusal I’ve gotten for coding/agentic tasks, and it has a basis in law somewhere, so I don’t fault OpenAI for that.

link

Cider9986 5 days ago

Very cool, thanks.

link