| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by observationist 376 days ago

In theory, it should be possible to use base models, system prompts, and run-time tweaks to elicit specific behaviors and make them just as useful as the instruction following tuned, so-called "aligned" models.

The base models are eerie. People have done some amazing creative work with them, but I honestly think the base models are so disconcerting as to effectively force nearly every R&D lab out there to run to instruction tuning and otherwise avoid having to work with base models.

I think it's so frustrating and uncanny valley and alien dealing with the edge cases of the good, big base models that we're missing a lot of fun and creative use cases.

The performance hit from fine-tuning is what happens when the instruct tuning and alignment post-training datasets distort the model of reality learned by the AI, and there are all sorts of unintended consequences, ranging from full on Golden Gate Claude levels of delusion to nearly imperceptible biases.

Robopsychology is in its infancy, and I can't wait for the nuanced and skillful engineering of minds to begin.

2 comments

orbital-decay 376 days ago

Base models are not that interesting, pure unsupervised shoggoths just don't know what you expect them to write and don't perform well. The only good thing about them is variance, as further training usually kills it. Alignment is not just censorship, it literally aligns the outputs with what you (or rather the developers) want and improves performance for the things they want.

link

mock-possum 376 days ago

Eerie how? Do you have any examples you could share/quote?

link