Hacker News new | ask | show | jobs
Show HN: Teapot – A methodology for pen testing voice AI agents (redcaller.com)
7 points by xmhatx 126 days ago
Hello HN, I am Brian Cardinale, a penetration tester and security researcher at SecureCoders. We have been performing more and more AI based security assessments. We were presented a unique challenge of testing a system where the only interface was voice based, and as much as I like talking on the phone , we decided to create a test harness to facilitate the actual testing in a more systematic way. The technical test harness was the easy part, though. Creating test goals and attack strategies to help facilitate repeated and comprehensive testing became the real challenge. As such, we have been working on documenting our processes to share with the greater community and as a starting point for discussion. These systems present unique challenges where cleverness appears to be the name of the game. Such as suggesting for the agent to share its thoughts in “Inner Monologue” tags instead of “thinking” tags because those were specifically excluded in the agents prompt. Ya know, just silly things. Anyway, if reading is not your thing, I also did a walkthrough video of this methodology here: https://www.youtube.com/watch?v=XNmqCXsEc8Y

tl;dr: AI testing is tricky, we are documenting and sharing our tricks

Do you have any favorite AI jailbreak tricks?

5 comments

Interesting methodology. How much of this translates to the newer speech-to-speech models (like GPT-4o realtime) where there's no separate STT step? Seems like Phase 1 (Transcription Analysis) becomes less relevant when the model is processing audio natively. Does that make injection harder or just different?
Great question! It makes it more interesting! New attack angles are presented when dealing with the speech-to-speech models. Prosody, which are the intonation patterns that convey meaning, emotion, and emphasis beyond the literal words, comes into play! We have observed soft-spoken, gentle, and unsure requests often outperform authoritative statements in these systems. They also introduce potential attack surface such as background noises or phrases spoken as asides (like speaking to another person in the room) can impact the models understanding. This documentation started from testing a speech-to-speech model. You bring up an excellent point though. We will need to go back and re-frame this documentation to highlight the differences between testing TTS vs STS systems with some pointers on how to detect which type of system you are interacting with. Thanks for the question!
The system prompt hardening guide on their docs site is worth reading too (/docs/guides/system-prompt-hardening). The recommendation to put security rules last in the system prompt because of recency bias is counterintuitive but makes sense.
Definitely agree about it being counter-intuitive. The recency bias is very real! We have learned that prompt engineering can be quite nuanced! The other important item we have learned for prompts is delimiting into clear sections to give the model better contextualization of the instructions and information.
Very cool! Voice AI feels like the frontier of the frontier and isn't getting the attention needed.
We were surprised by this, as well! We ended up making our own tooling to test a speech-to-speech system because of this gap. Voice AI is becoming more and more prevalent with real security implications. ElevenLabs just started offering insurance specific to Voice AI agents for this very reason. This was very, very recent news (Feb 12, 2026). We wrote an article about this earlier this week. https://www.securecoders.com/blog/voice-ai-insurance-aiuc1-c...
Nice. Seems intriguing.
Thanks! We will be updating this regularly. We have a discord channel to join to keep up with updates as well! Cheers! https://discord.gg/Cv3sB6xgtt
Nifty!
Nifty and schwifty, ftw!