Hacker News new | ask | show | jobs
by ibejoeb 252 days ago
I don't think any of the commercial models are doing RL at the consumer. The R is just accepting or rejecting the action, right?