| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by JimDabell 243 days ago

> For a long time I wanted to find some sort of litmus test to measure this and I think I found one that is an easy to understand programming problem, can be done in a single file, yet complex enough. I have not found a single LLM to be able to build a solution without careful guidance.

Plan for solving this problem:

- Build a comprehensive design system with AI models

- Catalogue the components it fails on (like yours)

- These components are the perfect test cases for hiring challenges (immune to “cheating” with AI)

- The answers to these hiring challenges can be used as training data for models

- Newer models can now solve these problems

- You can vary this by framework (web component / React / Vue / Svelte / etc.) or by version (React v18 vs React v19, etc.)

What you’re doing with this is finding the exact contours of the edge of AI capability, then building a focused training dataset to push past those boundaries. Also a Rosetta Stone for translating between different frameworks.

I put a brain dump about the bigger picture this fits into here:

https://jim.dabell.name/articles/2025/08/08/autonomous-softw...