|
|
|
|
|
by jlukecarlson
974 days ago
|
|
To the comments about this being easy to defeat: when it comes to detecting whether a person submitted a document containing LLM generated text (whether a law document, school essay, work document etc) the real value in a technique like this is high precision, not necessarily high recall. Yes many people can circumvent this simple watermark technique but for those who don't, it is essentially guaranteed that they used a LLM if their text has clearly atypical unicode marks (Whether U+2004, ligatures, or variant selectors). Thus an organization can feel confident in taking action against the individual who submitted the document. Whereas right now there are a bunch of dubious "LLM detector" models that output a confidence score that may or may not correspond to whether the person used an LLM. This low precision technique leads to people getting incorrectly accused of using LLM content. So in my opinion, a world of high precision (but potentially low recall) LLM watermarks using simple techniques is way better than this current high-noise low precision black box world of low quality "LLM detector models" |
|