Hacker News new | ask | show | jobs
by vundercind 853 days ago
This is different from a bug or hitting an unknown limitation—the selling point of this was “it makes shit up” and they went “yeah, cool, let’s have it speak for us”.

Its behavior incorporates randomness and is unpredictable and hard to keep within bounds on purpose and they decided to tell a computer to follow that unpredictable instruction set and place it in a position of speaking for the company, without a human in between. They shouldn’t have done that if they didn’t want to end up in this sort of position.

1 comments

We agree that this is an engineering failure - you can't deploy an LLM like this without guardrails.

This is also a management failure in badly evaluating and managing the risks of a new technology.

We disagree in that I don't think that its behaviour being hard to predict is on purpose: we have a new technology that shows great promise as tool to work with language input and outputs. People are trying to use LLMs as general purpose language processing machines - in this case as chat agents.

I'm reacting to your comment specifically because I think you are evaluating LLMs using a mental model derived from normal software failures and LLMs or ML models in general are different enough to make that model ineffective.

I almost fully agree with your last comment, but the

> they decided to tell a computer to follow that unpredictable instruction set

reflects what I think is now an unfruitful model.

Before deploying a model like this you need safeguards in place to contain the unpredictability. Steps like the following would have been options:

* Fine-tuning the model to be more robust over their expected input domain,

* Using some RAG scheme to ground the outputs over some set of ground truths,

* Using more models to evaluate the output for deviations,

* Business processes to deal with evaluations and exceptions, Etc