Hacker News new | ask | show | jobs
by CarrieLab 1281 days ago
I'm a PM at a human data company (https://www.surgehq.ai) that helps the large language model companies ensure their models are safe (we're the “clever prompt engineers” who helped Redwood assess their model performance).

We actually just published a blog today that includes our perspective on building “AI red teams” and best practices for AI alignment/safety: https://www.surgehq.ai/blog/ai-red-teams-for-adversarial-tra...

1 comments

> helps the large language model companies ensure their models are safe

Here's the Merriam-Webster definition for the word you're using:

    ensure : to make sure, certain, or safe : GUARANTEE
"ensure their models are safe" suggests you're claiming that you're using the "certain" definition, and that you can, for certain (which requires proof) guarantee safety of an LLM?