|
|
|
|
|
by anonzzzies
1162 days ago
|
|
How does this work? Does anyone know? And for a large swats of things, how can it possibly work? It’s not possible to say if or if not it is hallucinating code for almost all code and apis, for instance. And I see similar issues with many fields outside pure facts. With privacy issues as well. |
|
It would appear that this is not automated monitoring but more like a second stage of human reinforcement learning or perhaps a classifier. It seems that you create input/output examples and the LLM responses are examined by a secondary system (which I’m guessing is probably NOT an LLM, otherwise it would be vulnerable to attacks) and perhaps force regenerates the LLM response if it doesn’t meet the classification threshold.
At least, that sounds more believable to me than someone claiming they’ve fixed the inherent flaws in LLMs.