|
|
|
|
|
by senko
497 days ago
|
|
> What's surprising about this is how sparsely defined the rewards are Yeah, I would expect the rewards not to be binary. One could easily devise a scoring function in range [0-1] that would depend on how far the model is from the "correct" answer (for example, normalized Levenshtein distance). Whether that would actually do any good is anyone's guess. |
|