| HN Mirror

ARC-AGI is one of the few tests on which human can complete easily while LLMs still struggle. This model scores 45% on ARC-AGI-1 and 8% on ARC-AGI-2, the latter is comparable to Claude Opus 4 and GPT-5 High, behind only Claude Sonnet 4.5 and Grok 4 Thinking, for a model about 0.001% the size of commercial models.