| > how does a "tool" which constantly change its behaviour deserve being called a tool at all? Let's go with blast furnaces. They're definitely tools. They change over time - a team might constantly run one for twenty years but still need to monitor and adjust how they use it as the furnace itself changes behavior due to wear and tear (I think they call this "drift".) The same is true of plenty of other tools - pottery kilns, cast iron pans, knife sharpening stones. Expert tool users frequently use tools that change over time and need to be monitored and adjusted. I do think dogs and horses other animal tools remain an excellent example here as well. They're unpredictable and you have to constantly adapt to their latest behaviors. I agree that LLMs are unpredictable in that they are non-deterministic by nature. I also think that this is something you can learn to account for as you build experience. I just fed this prompt to Claude Code: Add to_text() and to_markdown() features to justhtml.html - for the whole document or for CSS selectors against it
Consult a fresh clone of the justhtml Python library (in /tmp) if you need to
It did exactly what I expected it would do, based on my hundred of previous similar interventions with that tool: https://github.com/simonw/tools/pull/162 |
Now let's make the analogy more accurate: let's imagine the blast furnace often ignores the operator controls, and just did what it "wanted" instead. Additionally, there are no gauges and there is no telemetry you can trust (it might have some that can the furnace will occasionally falsify, but you won't know when it's doing that).
Let's also imagine that the blast furnace changes behavior minute-to-minute (usually in the middle of the process) between useful output, useless output (requires scrapping), and counterproductive output (requires rework which exceeds the productivity gains of using the blast furnace to begin with).
Furthermore, the only way to tell which one of those 3 options you got, is to manually inspect every detail of every piece of every output. If you don't do this, the output might leak secrets (or worse) and bankrupt your company.
Finally, the operator would be charged for usage regardless of how often the furnace actually worked. At least this part of the analogy already fits.
What a weird blast furnace! Would anyone try to use this tool in such a scenario? Not most experienced metalworkers. Maybe a few people with money to burn. In particular, those who sing the highest praises of such a tool would likely be ignorant of all these pitfalls, or have a vested interest in the tool selling.