Hacker News new | ask | show | jobs
by mirekrusin 1137 days ago
We need RLHF -> RLCF/RLIF/RLEF (Reinforcement Learning from Compiler/Interpreter/Execution Feedback).