Hacker News new | ask | show | jobs
by vivekkalyan 2198 days ago
The rules seem pretty clear that consent is required from any persons appearing in any external datasets that are required. The winners scraped data from Youtube videos so I am not not sure the issue is.

The more worrying takeaway is that the winners scraped videos from people who clearly had no intention of their videos being used for a deepfake detection algorithm. Yet they did not think of the ethical considerations of using that data (did everyone in the video even have a say in the video being uploaded?). I think Kaggle disqualifying the team is the right move (even if it's a painful one for the winners).

6 comments

The article states the videos used a Creative Commons license that allowed for commercial use. It is an extremely liberal license that does not state "free for commercial use except for when used with facial recognition."
For people in a video you need a model release from them. This is also a mistake many people make, they use Creative Commons licenses and think they are safe. A picture or a video needs model releases for the people in the picture (several exemptions apply).
If that is true then basically all the photographs in Wikipedia are illegal since the only check they do is for copyright not for model release. Pretty sure that's not a legal requirement.
Of which photos are you thinking? It most certainly varies from country to country but public figures or random people captured when taking a picture of a landscape or a building are at least in some countries not subject to such rules.
But that Creative Commons licence was issued from the copyright holder of those videos, not the people in them. The people in those videos may not even have agreed to appear in the video if they were in a public place (the relevant legal term, at least here in the UK, is "reasonable expectation of privacy"). So if Kaggle requires people in the videos to consent taking part then that consent cannot be inferred from that licence.

What's more, if that consent is not legally required (there's a heavy "if" in this sentence, IANAL so I do pretend to know whether it's required e.g. under GDPR, but let's assume for a moment that it's not) then Kaggle are still perfectly at rights to ask for that permission to qualify for their competition. After all, it's their competition, and it's totally reasonable for them to set an ethical criteria that's even higher than legally required.

You're right, I missed that part of their rules. Looks like they did probably break them.

"A. If any part of the submission documentation depicts, identifies, or includes any person that is not an individual participant or Team member, you must have all permissions and rights from the individual depicted, identified, or included and you agree to provide Competition Sponsor and PAI with written confirmation of those permissions and rights upon request."

Yeah, with $1 million at stake, I can't believe this team of really smart people made such an incredible blunder.

The whole reason Facebook launched this challenge was to try and bury the bad PR over their data practices. If people in the external datasets had complained about the unauthorized use of their faces in the winning solution, it would've been pretty embarrassing for FB.

Note that isn't part of the rules. It's part of the "Winning submission documentation requirements" which is a separate document and wasn't mentioned at all on the "external data" Kaggle thread, which had Kaggle moderators explaining the rules.

Documentation requirements are pretty standard in Kaggle competitions, and usually cover having to supply your code, and maybe write a blog post about it. I've never seen one that had major rules in it.

I'm with you here. There are ethical concerns, legal concerns for productization, and overall this defeats the purpose of creating novel algorithms rather than a better trained model.

For instance, with the same scraping being used to train the deepfake GAN, would their model be more or less effective than a competitor model?

It seems like they won from a disparity in data not an innovative technical approach.

It's much better they learn now by being banned from a competition rather than having a lawsuit filed against them in the future.

The correct decision was made.

What if you took commercial video like a news broadcast vs youtube? Would that still be off limits?