|
|
|
|
|
by social_quotient
455 days ago
|
|
I think you’re spot on! We’re using a similar trick in our system to keep sensitive info from leaking… specifically, to stop our system prompt from leaking. We take the LLM’s output and run it through a RAG search, similarity search it against our actual system prompt/embedding of it. If the similarity score spikes too high, we toss the response out. It’s a twist on the reverse RAG idea from the article and maybe directionally what they are doing. |
|