|
|
|
|
|
by fc417fc802
78 days ago
|
|
Given that models don't currently learn as they go isn't that exactly what this benchmark is testing? If the model needs to either have been explicitly trained in a similar environment or else to have a human manually input a carefully crafted prompt then it isn't general. The latter case is a human tuning a powerful tool. If it can add the necessary bits to its own prompt while working on the benchmark then it's generalizing. |
|