| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mckirk 762 days ago
	"No dude, the bribe you offered was too much so the LLM got spooked, you need to stay in a realistic range. We've fine-tuned a local model on realistic bribe amounts sourced via Mechanical Turk to get a good starting point and then used RLMF to dial in the optimal amount by measuring task performance relative to bribe."

1 comments

RLMF: Reinforcement Learning, Mother Fucker!