Hmmm, I suppose we could, if there was a focus on it.
However it's always interesting to watch how people react:
* Traditional programming creates very strict functionality but is not flexible/is rigid.
* Okay, so let's train models to act like biological life, which is super flexible but has a chance to make mistakes.
But then people complain about the latter making mistakes. Realistically we can combine the two to ensure that what a model is saying is based on fact, though.
Easiest way would be just for people to interact with an LLM to browse hard facts and then double check via sources. But relying on humans not to be lazy is just as good as relying on an ML model to act like traditional programming, so it's probably better to train our wild model with as many built-in constraints as possible and then decorate those with more traditionally programmed hard limits.
It's really hard though because of the output data. We're going from a binary output of 1/0 for traditional programming to a continuous output of 0-1 for models; if I ask it to summarise something, whether it's correct or not is not only based on whether it summarised based only on the original text, but also down to the reviewer's individual biases and wants/needs as to whether the summarisation is sufficient.
The continuous nature of an LLM's response does make it difficult to determine if it's sufficiently factually correct, though. Because if it's not repeating word for word, you need to be able to parse the output in order to check it...using another LLM (so compounding error...)
However it's always interesting to watch how people react: * Traditional programming creates very strict functionality but is not flexible/is rigid. * Okay, so let's train models to act like biological life, which is super flexible but has a chance to make mistakes.
But then people complain about the latter making mistakes. Realistically we can combine the two to ensure that what a model is saying is based on fact, though.
Easiest way would be just for people to interact with an LLM to browse hard facts and then double check via sources. But relying on humans not to be lazy is just as good as relying on an ML model to act like traditional programming, so it's probably better to train our wild model with as many built-in constraints as possible and then decorate those with more traditionally programmed hard limits.
It's really hard though because of the output data. We're going from a binary output of 1/0 for traditional programming to a continuous output of 0-1 for models; if I ask it to summarise something, whether it's correct or not is not only based on whether it summarised based only on the original text, but also down to the reviewer's individual biases and wants/needs as to whether the summarisation is sufficient.
The continuous nature of an LLM's response does make it difficult to determine if it's sufficiently factually correct, though. Because if it's not repeating word for word, you need to be able to parse the output in order to check it...using another LLM (so compounding error...)