| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by charcircuit 8 days ago
	>Why should we expect a model to be aligned with human interests, if it has been trained on a myriad instances of humans being degraded and violated? Understanding more about what exists in the real world, outside of its pile of weights, is separate from alignment. If an AI model learns that it is possible for a house to burn down. That doesn't mean an AI will want to burn down a house.

3 comments

paytonjjones 8 days ago

Exposure to horrors doesn't imply capability or desire to commit said horrors. But it does seem like kind of a prerequisite.

All else being equal, I think I'd prefer my models to be naive about human degradation and torture, for instance. Exceptions made for specialized models used for police work etc.

I do think broader alignment is necessary either way but that seems like an extra guardrail it'd be nice to have.

link

charcircuit 8 days ago

>I'd prefer my models to be naive about...

In practice it's been shown that LLMs perform better when trained on more diverse data. Training on images in this domain can improve the performance of other domains. I would prefer to have models train as much data that exist.

>specialized models used for police work

The benefit of AGI is that you do not need to have special models for different domains.

link

anematode 8 days ago

Context matters; how many of these images in the training data are taken from shock websites, and therefore associated with misanthropic commentary, versus legitimate sources like medical journals or historical pictures? Based on the samples posted by the author, it seems likely to be mostly the former. Whereas most discussions of burning a house down (not saying all, of course!) are probably in a neutral or negative context (e.g., news articles describing a crime).

"Understanding more about what exists in the real world" is a remarkable euphemism, btw.

link

queenkjuul 8 days ago

The AI doesn't want or understand anything; it presents a statistically likely output given an input. Including this stuff in the inputs guarantees it is available as an output.

link