| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by senko 544 days ago
	> What's surprising about this is how sparsely defined the rewards are Yeah, I would expect the rewards not to be binary. One could easily devise a scoring function in range [0-1] that would depend on how far the model is from the "correct" answer (for example, normalized Levenshtein distance). Whether that would actually do any good is anyone's guess.