Say your webserver isn't scaling to more than 500 concurrent users. When you add more load, connections start dropping.
Is it because someone programmed a max_number_of_concurrent_users variable and a throttleExtraAboveThresholdRequests() function?
No.
Yes, humans built the entire stack of the system. Yes every part of it was "programmed", but no this behaviour wasn't programmed intentionally, it is an emergent property arising from system constraints.
Maybe the database connection pool is maxed out and the connections are saturating. Maybe some database configuration setting is too small or the server has too few file handles - whatever.
Whatever the root cause (even though that cause incidentally was implemented by a human if you trace the causal chain back far enough) this behaviour is an almost incidental unintended side effect of that.
A machine learning system is like that, but more so.
An LLM, say, is "parsing" language in some sense, but ascribing what it is doing to human design is pretty indirect.
In a way you typing words at me has in some way been "programmed" into you by every language interaction mankind has had with you.
I guess you could see it that way, but I don't think it's a particularly useful point of view.
In the same way an LLM has been in directly "programmed" via it's model architecture, training algorithm and training data, but we are nowhere near the understanding of the process to be able to consider this "programming" it yet.
This is different from a bug or hitting an unknown limitation—the selling point of this was “it makes shit up” and they went “yeah, cool, let’s have it speak for us”.
Its behavior incorporates randomness and is unpredictable and hard to keep within bounds on purpose and they decided to tell a computer to follow that unpredictable instruction set and place it in a position of speaking for the company, without a human in between. They shouldn’t have done that if they didn’t want to end up in this sort of position.
We agree that this is an engineering failure - you can't deploy an LLM like this without guardrails.
This is also a management failure in badly evaluating and managing the risks of a new technology.
We disagree in that I don't think that its behaviour being hard to predict is on purpose: we have a new technology that shows great promise as tool to work with language input and outputs. People are trying to use LLMs as general purpose language processing machines - in this case as chat agents.
I'm reacting to your comment specifically because I think you are evaluating LLMs using a mental model derived from normal software failures and LLMs or ML models in general are different enough to make that model ineffective.
I almost fully agree with your last comment, but the
> they decided to tell a computer to follow that unpredictable instruction set
reflects what I think is now an unfruitful model.
Before deploying a model like this you need safeguards in place to contain the unpredictability. Steps like the following would have been options:
* Fine-tuning the model to be more robust over their expected input domain,
* Using some RAG scheme to ground the outputs over some set of ground truths,
* Using more models to evaluate the output for deviations,
* Business processes to deal with evaluations and exceptions,
Etc
Say your webserver isn't scaling to more than 500 concurrent users. When you add more load, connections start dropping.
Is it because someone programmed a max_number_of_concurrent_users variable and a throttleExtraAboveThresholdRequests() function?
No.
Yes, humans built the entire stack of the system. Yes every part of it was "programmed", but no this behaviour wasn't programmed intentionally, it is an emergent property arising from system constraints.
Maybe the database connection pool is maxed out and the connections are saturating. Maybe some database configuration setting is too small or the server has too few file handles - whatever.
Whatever the root cause (even though that cause incidentally was implemented by a human if you trace the causal chain back far enough) this behaviour is an almost incidental unintended side effect of that.
A machine learning system is like that, but more so.
An LLM, say, is "parsing" language in some sense, but ascribing what it is doing to human design is pretty indirect.
In a way you typing words at me has in some way been "programmed" into you by every language interaction mankind has had with you.
I guess you could see it that way, but I don't think it's a particularly useful point of view.
In the same way an LLM has been in directly "programmed" via it's model architecture, training algorithm and training data, but we are nowhere near the understanding of the process to be able to consider this "programming" it yet.