Hacker News new | ask | show | jobs
by getnormality 320 days ago
It's interesting to compare this to the new third generation benchmarks from ARC-AGI, which are essentially a big collection of seemingly original puzzle video games. Both Mechanize (OP) and ARC want AI to start solving more real-world, long-horizon tasks. Mechanize wants to get AI working directly on real software development, while ARC suggests a focus on much simpler IQ test-style tasks.