|
|
|
|
|
by _pdp_
249 days ago
|
|
This is exactly the direction I am seeing agent go. They should be able to write their own tools and we are soon launching something about that. That being said... LLMS are amazing for some coding tasks and fail miserably at others. My hypothesis is that there is some sort of practical limit to how many concepts an LLM can hold into account no matter the context window given the current model architectures. For a long time I wanted to find some sort of litmus test to measure this and I think I found one that is an easy to understand programming problem, can be done in a single file, yet complex enough. I have not found a single LLM to be able to build a solution without careful guidance. I wrote more about this here if you are interested: https://chatbotkit.com/reflections/where-ai-coding-agents-go... |
|
Plan for solving this problem:
- Build a comprehensive design system with AI models
- Catalogue the components it fails on (like yours)
- These components are the perfect test cases for hiring challenges (immune to “cheating” with AI)
- The answers to these hiring challenges can be used as training data for models
- Newer models can now solve these problems
- You can vary this by framework (web component / React / Vue / Svelte / etc.) or by version (React v18 vs React v19, etc.)
What you’re doing with this is finding the exact contours of the edge of AI capability, then building a focused training dataset to push past those boundaries. Also a Rosetta Stone for translating between different frameworks.
I put a brain dump about the bigger picture this fits into here:
https://jim.dabell.name/articles/2025/08/08/autonomous-softw...