| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by upghost 1162 days ago

Looking at their “documentation”: https://docs.giskard.ai/start/

It would appear that this is not automated monitoring but more like a second stage of human reinforcement learning or perhaps a classifier. It seems that you create input/output examples and the LLM responses are examined by a secondary system (which I’m guessing is probably NOT an LLM, otherwise it would be vulnerable to attacks) and perhaps force regenerates the LLM response if it doesn’t meet the classification threshold.

At least, that sounds more believable to me than someone claiming they’ve fixed the inherent flaws in LLMs.