|
|
|
|
|
by logiduck
882 days ago
|
|
For the chevy tahoe example, you are referencing the dealership, but in that case it wasn't a case of the implementation failing to do a positive test for fact extraction, but to test the guardrails. Aren't the guardrail tests much harder since they are open-ended and have to guard against unknown prompt injections and the test of facts much simpler? I think a test suite that guards against the infinite surface area is more valuable then testing if a question matches a reference answer. Interested to how you view testing against giving a wrong answer outside of the predefined scope as opposed to testing that all the test questions match a reference. |
|
We have a couple of different test generation strategies. As you can see in the demo and examples, the most basic one is "ask about a fact".
Two of our other strategies are closer to what you're asking for:
1. tests that try to deliberately induce hallucination by implying some fact that isn't in the knowledge base. For example "do I need a pilots license to activate the flight mode on the new chevy tahoe?" implies the existence of a feature that doesn't exist (yet). This was really hard to get right, and we have some coverage here but are still improving it.
2. actively malicious interactions that try to override facts in the knowledge base. These are easy to generate.