If your customer base is so broud that you can't define a clear outcome for your nitche, your company probably isnt focused enough. Especially for a start up.
Take for instance, customer support Agent , that is supposed to resolve tickets. Assuming it resolves around 30% tickets by an objective measure. Do you think that cannot be captured and agreed upon by both sides?
Already, today, human customer support agents' performance is measured in ticket resolution, and the Goodhart's Law consequences of that are trivial visible to anyone that's ever tried to get a ticket actually resolved, as opposed to simply marked "resolved" in a ticketing system somewhere…
Interesting. Let's take the case of infra spend on AWS. Amazon says you invoked serverless calls 100k times and you are charged for it. How are you trusting them?
The comparison doesn't quite hold because AWS is a utility; they aren't an arbiter of quality. Amazon charges for a serverless call regardless of whether your code worked or crashed. You pay for the effort (compute), which is verifiable and binary.
Once you shift to billing for outcomes like "resolutions," the vendor switches from a utility provider to the judge and jury of their own performance. At scale, that creates a "fox guarding the henhouse" dynamic. The friction of auditing those outcomes to ensure they aren't just Goodharted metrics eventually offsets the simplicity the model promises. Frankly, I just cannot and will not trust the judgment of tech companies who evangelize their own LLM outputs.
How do you verify AWS charges? By inspecting logs? There goes the arbiter.
I get the binary part. The biggest difference is the subjective component of outcome? However, a tech provider - especially Agent provider - has to bring down the subjective to a quantitative metric when selling. If that cannot be done, I am not sure what we are going to be buying from Agent builders/providers?