| > For a long time I wanted to find some sort of litmus test to measure this and I think I found one that is an easy to understand programming problem, can be done in a single file, yet complex enough. I have not found a single LLM to be able to build a solution without careful guidance. Plan for solving this problem: - Build a comprehensive design system with AI models - Catalogue the components it fails on (like yours) - These components are the perfect test cases for hiring challenges (immune to “cheating” with AI) - The answers to these hiring challenges can be used as training data for models - Newer models can now solve these problems - You can vary this by framework (web component / React / Vue / Svelte / etc.) or by version (React v18 vs React v19, etc.) What you’re doing with this is finding the exact contours of the edge of AI capability, then building a focused training dataset to push past those boundaries. Also a Rosetta Stone for translating between different frameworks. I put a brain dump about the bigger picture this fits into here: https://jim.dabell.name/articles/2025/08/08/autonomous-softw... |