Hacker News new | ask | show | jobs
by KronisLV 1775 days ago
Here's an exceprt from the linked FSF blog article: https://www.fsf.org/blogs/licensing/fsf-funded-call-for-whit...

  Areas of interest
  While any topic related to Copilot's effect on free software may be in scope, the following questions are of particular interest:
    - Is Copilot's training on public repositories infringing copyright? Is it fair use?
    - How likely is the output of Copilot to generate actionable claims of violations on GPL-licensed works?
    - How can developers ensure that any code to which they hold the copyright is protected against violations generated by Copilot?
    - Is there a way for developers using Copilot to comply with free software licenses like the GPL?
    - If Copilot learns from AGPL-covered code, is Copilot infringing the AGPL?
    - If Copilot generates code which does give rise to a violation of a free software licensed work, how can this violation be discovered by the copyright holder on the underlying work?
    - Is a trained artificial intelligence (AI) / machine learning (ML) model resulting from machine learning a compiled version of the training data, or is it something else, like source code that users can modify by doing further training?
    - Is the Copilot trained AI/ML model copyrighted? If so, who holds that copyright?
    - Should ethical advocacy organizations like the FSF argue for change in copyright law relevant to these questions?
While i do believe that the topic is definitely worthy of discussion, my question would be a bit different.

If the tooling is already pretty capable, wouldn't just ignoring all of the ethical questions lead to having a market advantage? Say, some company doesn't necessarily care about how the tool was trained and the implications of that, but just utilize it to have their developers write software at a 1.25x the speed of competition, knowing that noone will ever examine their SaaS codebase and won't care about license compliance. Wouldn't that mean that they'd also be more likely to beat their competition to market? Ergo, wouldn't NOT using Codepilot or tools like Tabnine put most others at a disadvantage?

Personally, i just see that as the logical and unavoidable progression of development tooling, the other issues notwithstanding, very much like IDEs did become commonplace with their refactoring tooling and autocomplete.

I've worked with Visual Studio Code on large Java codebases, as i've also used Eclipse, NetBeans and in the past few years IntelliJ IDEA; with every next tool i found that my productivity increased bunches. Now it's to a point where the IDE suggests not only a variety of fixes for the code itself, but also the tooling, such as installing Maven dependencies, adding new Spring configurations and so on. It would be hard to imagine going back to doing things manually and it feels like in time it'll be very much the same way in regards to the language syntax or looking at documentation for trivial things. After all, i'm paid to solve problems, not sit around and ponder how to initialize some library.

2 comments

The actionable claims question is the hot question; the rest sort of is answered by that indirectly. It's mainly interesting from the point of view that a positive answer could cause commercial entities to ban the usage of co-pilot (and similar tools) in their organizations to avoid such claims. So, it could potentially be very damaging. Stackoverflow would be a nice example where people learn from each other where no doubt bits of IP from companies and OSS repositories gets mingled as well.

My impression is that these claims would not be actionable for a few simple reasons:

- The generated code is pretty small.

- The generated code is adapted to the context (i.e. not a vebatim copy).

- The generated code would be common to many repositories and not just one.

Because of all of the above, tracing any code fragment to a specific repository and then defending a claim would probably be very hard/impossible. Copyright is about the form of things and if it's not a verbatim copy of something really unique, it's hard to make the case for an infringement.

> knowing that noone will ever examine their SaaS codebase and won't care about license compliance

Everyone thinks this until they become the next Linksys, and have to crack open their entire tech stack because someone reverse engineered the text of the GPL in their firmware...

Frankly, i doubt that most software projects out there get that sort of attention. Aside from that, it's also very likely that management and the legal departments of most orgs don't even inspect the licenses of all the libraries that closely.

Not saying that i condone it or anything like that. However, it does feel like these things will oftentimes be ignored because of a lack of a regulatory body that'd inspect all codebases for compliance (even the idea of which doesn't feel feasible).

Because of that, cases where someone has both the skills to decompile a codebase and also has an axe to grind seem like the exception, rather than the norm.

In the linksys case no decompiling was even necessary. The plain text of the GPL license was present in the firmware image. Grep is a great tool for this sort of thing that everyone has access to :)