| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by circuit10 1148 days ago
	Oh, sorry, I’m not that familiar with the terminology (I still feel like my argument is valid despite me not being an expert though because I heard all this from people who know a lot more than me about it). One problem with that kind of feedback is that it incentives the AI to make us think it solved the problem when it didn’t, for example by hallucinating convincing information. That means it specifically learns how to lie to us so it doesn’t really help Also I guess giving feedback is sort of like babysitting, but I did interpret it the wrong way

1 comments

wizzwizz4 1147 days ago

> One problem with that kind of feedback is that it incentives the AI to make us think it solved the problem when it didn’t,

Supervised learning is: "here's the task" … "here's the expected solution" *adjusts model parameters to bring it closer to the expected solution*.

What you're describing is specification hacking, which only occurs in a different kind of AI system: https://vkrakovna.wordpress.com/2018/04/02/specification-gam... In theory, it could occur with feedback-based fine-tuning, but I doubt it'd result in anything impressive happening.

circuit10 1147 days ago

Oh, that seems less problematic (though not completely free of problems), but also less powerful because it can’t really exceed human performance