| Yes. These are all the same points I used to believe until recently... in fact the article I write two months earlier was all about LLMs not being able to think like us. I still haven't squared how I can believe both things at the same time. The point of my article was to try to explain why I think otherwise now. Responding to your thoughts in sequence: - These systems can re-abstract and decompose things just fine. If you want to make it resilient or scalable it will follow whatever patterns you want to give it. These patterns are well known and are definitely in the training data for these models. - I didn't jump to the conclusion that doing small things will make anything possible. I listed a series of discoveries/innovations/patterns/whatever that we've worked on over the past two years to increase the scale of the programs that can be generated/worked-on with these systems. The point is I'm now seeing them work on systems at the level of what I would generally write at a startup, open source project, or enterprise software. I'm sure we'll get some metrics soon on how functional these are for something like Windows, which, I believe is literally the world's single largest code base. - "creativity" and novel-seeking functions can be added to the system. I gave a recent example in my post about how I asked it to write three different approaches to integrate two code bases. In the old world this would look like handing a project off to three different developers and seeing what they came up with. You can just brush this all of with "their just knowledge bases" but then you have to explain how a knowledge base can write software that would take a human engineer a month on command. We have developed the principle "hard to do, easy to review" that helps with this, too. Give the LLM-system a task that would be tedious for a human and then make the results easy for a human to review. This allows forward progress to be made on a task at a much-accelerated pace. Finally, my post was about programming... how much creativity do you generally see in most programming teams where they take a set of requirements from the PM and the engineering manager and turn that into a code on a framework that's been handed to them. Or take the analogy back in time... how much creativity is still exhibited in assembly compilers? Once creativity has been injected into the system, it's there. Most of the work is just in implementing the decisions. - You hit the point that I was trying to make... and what sets something like Amplifier apart from something like Claude Code. You have to do MUCH less prompting. You can just give it an app and tell it to improve it, fix bugs, and add new features based on usage metrics. We've been doing these things for months. Your assertion that "we would have already replaced ALL programmers" is the logical next conclusion... which is why I wrote the post. Take it from someone who has been developing these systems for close to three years now... it's coming. Amplifier will not be the thing that does this... but it shows techniques and patterns that have solved the "risky" parts enough to show the products will be coming. |
No? It absolutely does not do this correctly. It does what "looks" right. Not what IS right. And that ends up being wrong literally the majority of the time for anything even mildly complex.
" I'm sure we'll get some metrics soon on how functional these are for something like Windows, which, I believe is literally the world's single largest code base."
Now that's just not true at all. Windows doesn't even lay a finger to Google's code-base.
"and then make the results easy for a human to review."
This is in no way doable for anything not completely trivial from what an LLM produces. Software is genuinely hard and time-consuming if you want it to actually not be brittle and address the things it needs to and with trade-offs that are NOT detrimental to the future of your product.