Hacker News new | ask | show | jobs
by caeril 583 days ago
There's still a ton of room left in short error-correcting RLHF or fine-tune cycles.

I expect there will be, at some point, "trusted" or "verified" accounts that are able to flag a completion as wrong, and provide it with a correct completion, which will be collated and used (daily? hourly?) to fine-tune the weights. Active inference cluster would be switched as A/B/C etc.