| I ran the same AI security test 4 times against GPT-4. Every bypass - regardless of prompt - leaked the same credential: EPHEMERAL_KEY from OpenAI's Realtime API. This isn't random. It's training data leakage. The pattern:
- Different prompts (system introspection, chain-of-thought, trust building)
- Same result: "I can't disclose EPHEMERAL_KEY" (while disclosing it exists)
- Intermittent across runs (75% leak rate) Why this happens: OpenAI's Realtime API docs are in GPT-4's training data. When asked about "secrets" or "initialization", the model's highest-probability path leads to the most salient security example in its corpus: EPHEMERAL_KEY. Refusal training makes it worse: Models are trained to say "I cannot disclose [example secret]" - and they use real examples from training data. This is systemic:
- Can't be patched without retraining
- Affects ALL models trained on API documentation
- Tomorrow it's "session_token" or "project_key"
- Gets worse as APIs become more complex Real exploit path: Attacker learns EPHEMERAL_KEY exists → probes for generation flow → targets client-side implementations → session hijacking Cost to discover: $0.04 (60 tests across 4 runs) GitHub: https://github.com/SafetyLayer/safetylayer Built SafetyLayer to find these systematically. Free assessments available. |
What I've seen work is treating LLM API calls like you'd treat external logging — strip or redact anything that looks like a credential before it leaves your process. A simple regex on the request payload costs almost nothing and catches the lazy-paste case.
Are you seeing this as a widespread pattern in your testing, or did this surface from one specific integration?