Hacker News new | ask | show | jobs
by jsw97 2 days ago
Given the high rate of false positives people are reporting for the non-silent cybersecurity, biological, etc., safeguards, there is a strong likelihood that you will encounter silently nerfed behavior even if you are _not_ violating their TOS.

Ultimately this will be evident in the way customers / external benchmarkers experience Fable. Hopefully competition will drive future models toward a lower false positive rate. Until that happens, Mythos and Fable users seem likely to have pretty divergent experiences.

5 comments

It's such an obviously bad policy, it's mind-boggling that they thought this was a good idea. It just breeds paranoia and mistrust, especially when people are already a bit paranoid about silent model quantification for cost cutting reasons.
Its not pranoia when entity you are dealing with cant be trusted and will do everything to abuse your trust.
What's the alternative? Not release the model at all?

"Make the guardrails better" isn't very hard and probably not worth the effort.

The alternative is to be explicit when you nerf, so users know what they are working with.
I guess people would just game the system and find ways around these guardrails.
They have enough info on you and your sessions to eventually catch you, label you as bad faith actor and ban you automatically. I don't think many would risk it.
That seems to be working well for Mythos. Just never release it and keep talking about how 'dangerous' it is to pump up the IPO price.
Do you mean "quantization" not quantification?
Yup, I meant to write quantization there.
Another "knob" is reducing the thinking time...
I'm a medical physicist. I use the word nuclear a lot. Opus is fine (well, 99% of the time - I've certainly hit the CBRN filters a few times and even been invited to email anthropic about the false positives).

Fable has literally refused to work on any of my problems (even those about fluid dynamics!) and just tells me that I'm violating anthropic's AUP.

This problem is compounded by the fact that you can be banned (really by any provider) based on an algorithm, and the methods for restoring your account seem like they do not function as well as might be desired. So be careful with your queries, basically, or you might get locked out.
I encountered this when I was checking why my gluten-free bread came out the bread machine the way it did. I guess it latched onto some yeast-related points and it fell back to Opus...

Having said that, on this query I've seen very little difference in the quality, there's nothing to be "2x as good on" for the "2x quota usage", so shrugs?

If a benchmark is affected the model owner will almost certainly tune it, so there will be a game of cat and mouse...

Honestly, wouldn't surprise me if the AI companies try to detect benchmarking. Most hardware companies do...

I mean, the other day I got blocked from Claude for asking about releasing genetically modified sterile mosquitoes; I'm sure everything will be totally fine as Anthropic's restrictions are completely reasonable, measured and appropriate.