|
|
|
|
|
by HarHarVeryFunny
589 days ago
|
|
It's cheating to the extent that it misrepresents the strength and reasoning ability of the model, to the extent that anyone is going to look at it's chess playing results and incorrectly infer this says anything about how good the model is. The takeaway here is that if you are evaluating different models for your own use case, the only indication of how useful each may be is to test it on your actual use case, and ignore all benchmarks or anything else you may have heard about it. |
|