|
|
|
|
|
by chant4747
705 days ago
|
|
Can you help me understand why people seem to think of Connections as a more robust indicator of (general) performance than benchmarks typically used for eval? It seems to me that while the game is very challenging for people it’s not necessarily an indicator of generalization. I can see how it’s useful - but I have trouble seeing how a low score on it would indicate low performance on most tasks. Thanks and hopefully this isn’t perceived as offensive. Just trying to learn more about it. edit: I realize you yourself indicate that it's "just one benchmark" - I am more asking about the broader usage I have seen here on HN comments from several people. |
|