| Thanks for asking these important questions! * What are the challenges surrounding verification that your system functions properly, given that the test material is illicit? You're right that storing child sexual abuse material (CSAM) is illegal, unless you are the National Center for Missing and Exploited Children (NCMEC) or law enforcement. What is legal is to maintain a hash of known CSAM. NCMEC, Law Enforcement, and large tech companies maintain their own data sets of known CSAM hashes and, where appropriate, share them. The Technology Coalition [1] has more information on this.
All that said, we can and do simulate the system to verify that it works properly using bench testing with non-illegal content [2]. * Can you speak to the reliability of the system in a sensitivity/specificity kind of way? In other words, what are the false positive and false negative rates? The false positive rate in practice is very low. We set our thresholds based on bench tests with an expected false positive rate of less than 1/1000 (the thresholds vary based on which hash function was used). Different hash functions are more resilient to some transformations than others (e.g., cropping, watermarks, etc.). For the false negative rate, it depends entirely on the kind of modification made to the image. For many common operations, it is close to zero. * Are you aware of any large organizations leveraging your solution? Thorn builds technology to defend children from sexual abuse, one of the products we build for this purpose is Safer [3]. Perception provides an easy way to get started using the Safer matching service. Safer provides a more robust and complete solution including handling a queue of content and reporting tools. Some organizations using Safer include Imgur, Flickr, and Slack. But this technology (perceptual hashing) is used by many companies who don't use our tools. Our goal is just to make it easier for more people to get started. * Do you feel that the availability of these tools obligates service providers to use them, either morally or legally? Not being a lawyer or a public policy expert, what I can say is that the law, as I understand it, requires companies to report CSAM once they are aware of it. Working in this field I’ve learned two things pertinent to this question: (1) Most people don’t know how pervasive of an issue this is, and (2) There aren’t a lot of easy ways to start protecting your platform from this abuse. No one wants the cool new products and platforms they make to be used to abuse children. Privacy is important too, which is why solutions that preserve privacy and avoid leaking private information to third parties are critical, and perceptual hashing allows us to do both. [1] https://www.technologycoalition.org/ [2] https://perception.thorn.engineering/en/latest/examples/benc... [3] https://getsafer.io EDIT: Line breaks |