| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by BillStrong 33 days ago

Even in humans, negative stimuli carries more weight than positive, in the general case.

Without reading it yet, my first thought would be to test a general ratio, something similar to human interpersonal relationship ratios like 30% negative to mostly positive, and positive are targeted, such as reinforcement not just for the good job, but reinforcement for the improvement.

And ensure the negative is targeted, such that you point out tendencies to be avoided rather than just specific instances.

Of course, most human interaction online has none of this, so, would be hard to replicate.

1 comments

sebastian 33 days ago

Yeah, I like the ratio framing. That does seem like the kind of experiment you'd want to run next.

The thing I'd be curious to separate out is ratio vs density. The fiction examples were positive, but a lot of the tokens are still spent on normal story work. The targeted examples put much more of the training signal on the AI being in the relevant situation and choosing against the bad option.

That makes me think the next thing to test is not just the positive/negative mix, but how much of the data is actually about the failure mode.