| Reminds me of the old joke "90% of the code is 90% of the work. The last 10% of the code is the other 90% of the work." I have spent almost my entire adult life (since 1986) shipping products. One of the very first things that I learned, was that "shipping" > "designing". There's so much work in delivering products that will carry your brand, and then must be supported. I liken it to having children. Conceiving them is fun. Delivering them is painful. Raising them, is a lifetime of work. In my experience, the same type of thing applies to products that we ship (and charge money for). |
People think otherwise with AI partly because Anthropic kept telling us that they didn't have to write code or review code any more for most of their work. Their agent swarms just comb through their github, slack and wikis to figure out what to do next, and another swarm of agents just review, test, merge, deploy, A/B test, and revert the code. Boris alone merged nearly 300 PRs in the past week (or two?). So the top research labs seem have broken the productivity seal.
And then they talk about this recursively self-improving AI that is so powerful, so autonomous that they advocate that every company should be prepared to "pause" the effort. And their Fable/Mythos has this specific restriction as mentioned in their model card[1] that they are going to reject requests to tune and train models because, well you guess it, the models are too powerful to be used by mere mortals.
[1] We’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design). Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms. Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT).