| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by CarrieLab 1281 days ago
	I'm a PM at a human data company (https://www.surgehq.ai) that helps the large language model companies ensure their models are safe (we're the “clever prompt engineers” who helped Redwood assess their model performance). We actually just published a blog today that includes our perspective on building “AI red teams” and best practices for AI alignment/safety: https://www.surgehq.ai/blog/ai-red-teams-for-adversarial-tra...

1 comments

nomel 1280 days ago

> helps the large language model companies ensure their models are safe

Here's the Merriam-Webster definition for the word you're using:

    ensure : to make sure, certain, or safe : GUARANTEE

"ensure their models are safe" suggests you're claiming that you're using the "certain" definition, and that you can, for certain (which requires proof) guarantee safety of an LLM?

link