Hacker News new | ask | show | jobs
by bpcrd 1548 days ago
In 2017 deepfakes were pretty crude, today the avg person can’t tell a real face from a deepfake generated on a 5 year old iPhone. We expect the tech to continue moving in this direction. So, similar to anti-virus, we're approaching this problem with an iterative, multi-model solution that can evolve with the threat.
2 comments

So the folks at the forefront of deep fake technology (i.e. the attackers you're targeting) will slip through your product because it lags behind the state of the art (like AV, which you said is the approach you're following), while innocent folks will be caught by it due to a new kafkaesque version of "prove you're not a bot" since you focus on reducing false negatives. Hopefully I can avoid companies using your product.
Retrospective antivirus-esque techniques are still useful, though, as not every actor is a state-level actor, and even then, forcing state-level actors to "burn" their state-of-the-art exploits/models because previous exploits/models are detected out-of-the-box, slows down the abuse of those actors.

And realistically, since deepfake detection will inevitably be more expensive than captchas or antivirus scanning, this will be adopted by human-in-the-loop organizations for critical processes where threat scoring or moderation is already being applied.

That said - Reality Defender, please train your system on diverse human data sets, do not release models where ethnicity or gender (including gender identity) are nontrivially correlated with deepfake score, and have processes in place from day 1 to allow users to report suspected patterns of bias. The kafkaesque "prove you're not a bot" scenario envisioned by the parent poster is one thing for holistic human-in-the-loop verification processes, and another thing if it suppresses minority voices and minority access to government services.

We agree. Dataset fidelity and bias are major concerns for publicly available datasets. For this reason we are working to develop programmatically created datasets along with anti-bias testing and policies.
"Bias" and "anti-bias" is a slippery snake that will bite you as soon as it warms up to you.
Of course, because these companies are probably owned by the same people in the end that develop the DeepFake datasets, generating endless income from both sides.

It's like ADA Compliance lawsuits. I can't prove the AccessaBe or other "ADA Compliance" web tooling are generating these lawsuits, but their company would not exist without them. Why wouldn't they want more lawsuits?

The majority of large, popular datasets in deep learning are curated and hosted by academics:

https://paperswithcode.com/task/deepfake-detection#datasets

Thanks, yes, we benchmark on these research datasets as well.
Biggest fear is people being put in prison based on deepfake video evidence.
At some point it might make video evidence inadmissible. I imagine a tool like Reality Defender offers could be used in court to tackle this, but now Reality Defender itself needs verification. Like all tools used to verify evidence I suppose. Is a negative from Reality Defender enough to prove that a video is real, or is it only valid proof when it's proving it's fake?
It's not about verification, it's about reasonable doubt.
Right, but verification is the opposite direction of doubt. It's not that verification is a boolean true or false for the evidence, it just helps remove doubt. For example, CCTV footage verified to be pulled from a service station by the police is less easy to doubt CCTV footage provided by a friend of the accused. Adding a verification paper trail to that makes it even harder to doubt. DNA test results provided by the defendant are worth nothing unless they're verified by a paper trail that's trusted.

So back to Reality Defender, why shouldn't we doubt Reality Defenders positive or negative results? There would need to be a period of verification and testing that "proves" within a reasonable margin that it works.

Great comments - Better results require more data, improving models, and onboarding new (different) models. All areas of focus for us!