|
|
|
|
|
by rfoo
727 days ago
|
|
> You're saying the publicly available problem set isn't indicative of the distribution of the test set? Yes. From https://arcprize.org/guide: Please note that the public training set consists of simpler tasks whereas the public evaluation set is roughly the same level of difficulty as the private test set.
The public training set is significantly easier than the others (public evaluation and private evaluation set) since it contains many "curriculum" type tasks intended to demonstrate Core Knowledge systems. It's like a tutorial level.
|
|
Like with our toy "algebra" examples, sure there's a lot of emphasis on repetition and rote in primary education on these subjects, and that's one way to get people more consistent at getting the calculations right, but to be frank I don't think it's the best way, or as crucial as it's made out to be. What someone really needs to understand about algebra is how the notation works and what the symbols mean. Like I can't unsee the concept of "+" as a function that takes two operands and starts counting for as many steps as one would in the right operand, starting at the value of the left operand. When looking at algebra, the process I go through relies on a bunch of conceptual frameworks, like "Anything in the set of all arabic numerals can be considered a literal value". "Anything in the roman alphabet is likely a variable". "Any symbol is likely an infix operator, that is, a function whose operands are on either side of it". Some of the concepts I'm using are just notational convention. At some point I memorized the set of arabic numerals, what they look like, what each of them means, how they're generally written in relation to each other to express quantities combinatorically. Some of the concepts are logical relations about quantities, or definitions of functions. But crucially, the form of these distillations makes them composable. If I didn't really understand what "+" does, then maybe someone could give me some really bad homework that goes
1 + 30 = 31
20 + 7 = 27
3 + 10 = 13
And then present me the problem
20 + 10 + 3 = ?
And I'd think the answer is
20 + 10 + 3 = 213
That demonstrates some model of how to do these calculations, but it doesn't really capture all the important relationships the symbols represent
We can have any number of objections to this training set. Like I wasn't presented with any examples of adding two-digit numbers together! OR even any examples where I needed to combine numbers in the same rank!
Definitely all true. Probably mistakes we could make in educating a kid on algebraic notation too. It's really hard to do these things in a way that's both accomplishing the goal and testable, quantifiable. But many humans demonstrate the ability to distill conceptual understanding of concepts without exhaustive examples of their properties, so that's one of the things ARC seems to want to test. It's hard to get this perfectly right, but it's a reasonable thing to want