| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by _jonas 601 days ago
	Curious to learn how much harder it is to red-team models that use the second line of defense of an explicit guardrails library that checks the LLM response in a second step. Such as Nvidia's Nemo Guardrails package.