Hacker News new | ask | show | jobs
by mckirk 762 days ago
"No dude, the bribe you offered was too much so the LLM got spooked, you need to stay in a realistic range. We've fine-tuned a local model on realistic bribe amounts sourced via Mechanical Turk to get a good starting point and then used RLMF to dial in the optimal amount by measuring task performance relative to bribe."
1 comments

RLMF: Reinforcement Learning, Mother Fucker!