|
|
|
|
|
by ggm
498 days ago
|
|
Adversarial play might be interesting. I'm assuming the goal here is to use the feedback as training input, fitness to expectations and like. So a random chance of a bad actor adversary? Or high scores for lying? "The traitor" style Team models? I suppose I'm arguing for gamification with a leader board and hubristic bragging rights. |
|
The LLM we use is mistral-large-latest, we didn't do any training on the data yet