| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by whimsicalism 1130 days ago
	It would require some sort of continuously tuned arbiter, ie. similar to in RLHF as well as an adversarial-style scheme a la GAN. But I really am spitballing here - research could absolutely go in a different direction. But lets say you reduced it to some sort of 'trying to prove a statement' that can be verified along with a discriminator model, then compare two iterations based on whether they are accurately proving the statement in english language.