| > So it shouldn't matter. It definitely does matter that any secrets generated were already public: * Emitting secrets from private repos would be a huge confidentiality issue (though really you shouldn't commit code secrets to git at all), as it'd be taking something that's private + exploitable and making it public * Emitting secrets that are already public doesn't cause the confidentiality issue. Once a secret is out, it's out, and should be changed immediately. By the time it's in Copilot's training set, it'll have already been on search engines/archive sites/black-hat forums/etc. Tangentially, GitHub do also do some scanning to alert of accidentally committed secrets in repos: https://docs.github.com/en/code-security/secret-scanning/abo... > 2. Already public unintentionally. Right, but therefore already compromised and no longer confidential. Copilot isn't leaking any secrets, someone else did by making them public. > AFAIK, any code that I produce is automatically copyrighted to me. This means if I write something in public and not provide a license, IT IS LEGALLY under the copyright protection provided to me by my country. At least that is the case in US and India which are home to a huge portion of OSS. Essentially correct, to my understanding. If you're making it public, you'll generally also give some hosting/publishing/distribution rights to the services involved - as specified by their T&C. > Reproducing it and remixing my work would be illegal The US has the concept of fair use which provides exceptions for "transformative” purposes. For example: copying and downscaling your image to use as a thumbnail, caching the webpage your work is on, or creating a parody of your work. Consider Google Books for example, where Google scanned millions of copyrighted books and made them searchable (showing snippets). This was ruled fair use due to being transformative. Question would be whether code generated by Copilot that falls under this. Ultimately it's up to the courts to decide, but I'd lean in favor of "yes". > The PII and secrets is enough to find out the license of a repo which would make it easier to prove whether they violated it or not. Don't tell me all OSS that copilot has trained on is only public domain stuff. Even ISC license needs attribution. Fair use is about unlicensed usage, so if it's fair use then it doesn't need to abide by the terms of the licenses. Even if it's ruled not to be fair use, I think they could still train it on GitHub-hosted code due to the mentioned rights you give them by agreeing to GitHub's T&C. > How does that change anything I mentioned? Very curious. Changes your claim of impossibility, so now it's just about whether there's a violation. |