|
|
|
|
|
by epistasis
251 days ago
|
|
If a simple majority classifier has the same performance as a fancy model with 58 layers of transformers, and you use your fancy model instead of the majority classifier, is it the model that's doing the discovery or is it the operator that choose to look in a particular place? |
|