|
|
|
|
|
by nialse
205 days ago
|
|
Nothing points out that the benchmark is invalid like a zero false positive rate. Seemingly it is pre-2020 text vs a few models rework of texts. I can see this model fall apart in many real world scenarios. Yes, LLMs use strange language if left to their own devices and this can surely be detected. 0% false positive rate under all circumstances? Implausible. |
|
Find me a clean public dataset with no AI involvement and I will be happy to report Pangram's false positive rate on it.