|
|
|
|
|
by mdp2021
467 days ago
|
|
Can I just wholeheartedly congratulate you for having found a critical benchmark to evaluate LLMs. Either they achieve 100% accuracy in your game, or they cannot be considered trustworthy. I remain very confident that modules must be added to the available architectures to achieve the "strict 100%" result. |
|