Hacker News new | ask | show | jobs
by neoneye2 733 days ago
I'm Simon Strandgaard and I participated in ARCathon 2022 (solved 3 tasks) and ARCathon 2023 (solved 8 tasks).

I'm collecting data for how humans are solving ARC tasks, and so far collected 4100 interaction histories (https://github.com/neoneye/ARC-Interactive-History-Dataset). Besides ARC-AGI, there are other ARC like datasets, these can be tried in my editor (https://neoneye.github.io/arc/).

I have made some videos about ARC:

Replaying the interaction histories, and you can see people have different approaches. It's 100ms per interaction. IRL people doesn't solve task that fast. https://www.youtube.com/watch?v=vQt7UZsYooQ

When I'm manually solving an ARC task, it looks like this, and you can see I'm rather slow. https://www.youtube.com/watch?v=PRdFLRpC6dk

What is weird. The way that I implement a solver for a specific ARC task is much different than the way that I would manually solve the puzzle. Having to deal with all kinds of edge cases.

Huge thanks to the team behind the ARC Prize. Well done.

2 comments

The UX of your solution entry is _way_ better than the ARC site itself.
Being able to hold the mouse button down is certainly much nicer. Not being able to see the examples while you are solving makes it harder than it should be though.
I have create an issue with your suggestion. https://github.com/neoneye/ARC-Interactive/issues/67

Seeing the examples while having the editor visible. That's a good idea. I haven't explored this direction, since I had my phone (with tiny screen estate) in mind.

Drafts for a such a UI are much welcome. However I'm probably too lazy to code it though.

That warms my heart. Thank you.

The short story. I needed something that could render thumbnails of tasks, so I could visual debug what was going on in my solver. However I have never gotten around to make the visual inspection tool. After having the thumbnail renderer, mid january 2024, then it eventually turned into what it is now.

"Here is a challenge, designed to be unsolvable or so. We'll give you a bazillion dollars if you complete the challenge, and, in the meantime, we will use your attempts to train an as AI that will be worth the cost!!"
In the most charitable interpretation of this comment - I can understand the feeling, when so much of social media interactions are in the form 'It's post a picture of you as a baby, 10 year old, and current age!'. Those and many other instances can bring out excessive skepticism

But the people involved in this haven't signaled that they are in that path, either in the message about the challenge (precisely the opposite) or seemingly in their careers so far

So I guess I don't share the concern but a better way to phrase your comment could be -

"how can we be sure the human-provided solutions won't turn out to be just fodder for training a RL model or something that will later be monetized, closed and proprietary? Do the challenge organizers provide any guarantees on that?"

No, you missed the point. The striking thing about ARC is the puzzles are super easy, for humans. The average person solves 85% of the tasks, but the worlds best LLMs are only solving 5%. The challenge is to simply make an AI score as well as the average human.
Did you even try the puzzles? They’re not particularly “unsolvable”.
ARC-AGI: "here are some pretty simple puzzles, we'll give you a million dollars to solve them!"

Human: "They're quite challenging, this might be a trick to engage activity for the purpose of training models."

skrebbel: "You're stupid".

Did you try the puzzles?
No. What is the purpose of this competition? Unlikely that the reason for it is to pay out an enormous reward, right? Easy or not easy, the fortune is only rewarded to the system that solves the puzzles. The reward is too valuable to be given away easily. Ipso facto, solving the puzzles is deemed challenging by those who present the competition.
Are you writing this under every challenge with a monetary reward? The point of the challenge is that it is hard to do for an AI and easy for a human. Of course it is not easy to solve, that’s the point of the challenge. But the puzzle itself is not very hard.