Hacker News new | ask | show | jobs
by davidlee1435 2603 days ago
I wonder if you could make an RL version of a GPT-2 model specifically optimized for code, where you try to compile the output and penalize whenever there is an exception/error that's thrown.