Hacker News new | ask | show | jobs
by senko 497 days ago
> What's surprising about this is how sparsely defined the rewards are

Yeah, I would expect the rewards not to be binary. One could easily devise a scoring function in range [0-1] that would depend on how far the model is from the "correct" answer (for example, normalized Levenshtein distance). Whether that would actually do any good is anyone's guess.