Hacker News new | ask | show | jobs
by HappMacDonald 598 days ago
I wonder what would happen if token input included the logprob (or n/a for input from outside the LLM) of each token selected and the network were trained with that extra layer of information, especially during the human feedback training at the end.