|
|
|
|
|
by tngranados
204 days ago
|
|
The point of benchmarking that is checking for hallucinations and overfitting. Does the model actually check the picture to count the legs or does it just see it's a dog and answer four because it knows dogs usually has four legs? It's a perfectly valid benchmark and very telling. |
|