| HN Mirror

We agree that this is an engineering failure - you can't deploy an LLM like this without guardrails.

This is also a management failure in badly evaluating and managing the risks of a new technology.

We disagree in that I don't think that its behaviour being hard to predict is on purpose: we have a new technology that shows great promise as tool to work with language input and outputs. People are trying to use LLMs as general purpose language processing machines - in this case as chat agents.

I'm reacting to your comment specifically because I think you are evaluating LLMs using a mental model derived from normal software failures and LLMs or ML models in general are different enough to make that model ineffective.

I almost fully agree with your last comment, but the

> they decided to tell a computer to follow that unpredictable instruction set

reflects what I think is now an unfruitful model.

Before deploying a model like this you need safeguards in place to contain the unpredictability. Steps like the following would have been options:

* Fine-tuning the model to be more robust over their expected input domain,

* Using some RAG scheme to ground the outputs over some set of ground truths,

* Using more models to evaluate the output for deviations,

* Business processes to deal with evaluations and exceptions, Etc