|
|
|
|
|
by HappMacDonald
598 days ago
|
|
I wonder what would happen if token input included the logprob (or n/a for input from outside the LLM) of each token selected and the network were trained with that extra layer of information, especially during the human feedback training at the end. |
|