| We agree that this is an engineering failure - you can't deploy an LLM like this without guardrails. This is also a management failure in badly evaluating and managing the risks of a new technology. We disagree in that I don't think that its behaviour being hard to predict is on purpose: we have a new technology that shows great promise as tool to work with language input and outputs. People are trying to use LLMs as general purpose language processing machines - in this case as chat agents. I'm reacting to your comment specifically because I think you are evaluating LLMs using a mental model derived from normal software failures and LLMs or ML models in general are different enough to make that model ineffective. I almost fully agree with your last comment, but the > they decided to tell a computer to follow that unpredictable instruction set reflects what I think is now an unfruitful model. Before deploying a model like this you need safeguards in place to contain the unpredictability. Steps like the following would have been options: * Fine-tuning the model to be more robust over their expected input domain, * Using some RAG scheme to ground the outputs over some set of ground truths, * Using more models to evaluate the output for deviations, * Business processes to deal with evaluations and exceptions,
Etc |