|
|
|
|
|
by gbjw
1257 days ago
|
|
A reward is not a constraint. In the language of modern ml, rewards 'encourage' models to produce certain constrained outputs. The actual outputs during inference can be arbitrarily poor in spite of the added 'reward' during training. |
|