Hacker News new | ask | show | jobs
by candiddevmike 1550 days ago
So the folks at the forefront of deep fake technology (i.e. the attackers you're targeting) will slip through your product because it lags behind the state of the art (like AV, which you said is the approach you're following), while innocent folks will be caught by it due to a new kafkaesque version of "prove you're not a bot" since you focus on reducing false negatives. Hopefully I can avoid companies using your product.
2 comments

Retrospective antivirus-esque techniques are still useful, though, as not every actor is a state-level actor, and even then, forcing state-level actors to "burn" their state-of-the-art exploits/models because previous exploits/models are detected out-of-the-box, slows down the abuse of those actors.

And realistically, since deepfake detection will inevitably be more expensive than captchas or antivirus scanning, this will be adopted by human-in-the-loop organizations for critical processes where threat scoring or moderation is already being applied.

That said - Reality Defender, please train your system on diverse human data sets, do not release models where ethnicity or gender (including gender identity) are nontrivially correlated with deepfake score, and have processes in place from day 1 to allow users to report suspected patterns of bias. The kafkaesque "prove you're not a bot" scenario envisioned by the parent poster is one thing for holistic human-in-the-loop verification processes, and another thing if it suppresses minority voices and minority access to government services.

We agree. Dataset fidelity and bias are major concerns for publicly available datasets. For this reason we are working to develop programmatically created datasets along with anti-bias testing and policies.
"Bias" and "anti-bias" is a slippery snake that will bite you as soon as it warms up to you.
Of course, because these companies are probably owned by the same people in the end that develop the DeepFake datasets, generating endless income from both sides.

It's like ADA Compliance lawsuits. I can't prove the AccessaBe or other "ADA Compliance" web tooling are generating these lawsuits, but their company would not exist without them. Why wouldn't they want more lawsuits?

The majority of large, popular datasets in deep learning are curated and hosted by academics:

https://paperswithcode.com/task/deepfake-detection#datasets

Thanks, yes, we benchmark on these research datasets as well.