|
|
|
|
|
by lucian123
993 days ago
|
|
I'm not an expert in training LLMs, but I've heard that some people use reinforcement algorithms to train and align LLM behaviors with human preferences. When it comes to designing a loss function for training, I wonder if it's possible to assign an extremely high loss value to hallucinated content during training. This approach might encourage the model to refrain from generating inaccurate content. |
|