Hacker News new | ask | show | jobs
by unreal37 1132 days ago
That's quite an extreme set of statements, and I very much doubt what you consider "illegal" is actually illegal.

When you publish something for others to view (text, images, code, whatever), others are allowed to view it. You can't anticipate how others view it, with their eyes or with screenreaders to assist. You can't stop them from reading it, thinking about it, discussing it with their friends, taking notes, summarizing it. You can't stop people from learning from your published content or recognizing patterns between it and other similar things.

Sorry, but you can't create a license that says "I will allow you to view this but you cannot learn from it. If you learn from it, you need to pay me."

2 comments

Learning is very different from copying. I can take a movie and converts it to different formats and resolutions. I can use an AI algorithms to remove rough edges, and even add color to images which was taken in black and white. None of that would be covered by using the word learning, even if the program takes the movie as input and learns from it and outputs a work with is completely different from the original.

The word that seems to fit best is transforming and adapting. In order to adapt something, one has to first learn from the original in order to produce the derivative work. This is however covered by copyright, since the transforming and adapting is still considered a form of copying even if all people did was learning and producing something unique but similar to the original.

The license can say that "I will allow you to view this but you cannot create a derviate work from it".

This isn’t about a person learning, however. This is about developing an algorithm through the inclusion of GPL licensed code, that might — and has — verbatim emitted that code. Those seem materially different to me.
You can without attribution verbatim copy the parts of GPL code that is not covered by copyright, such as anything purely functional, like an optimized sorting algorithm.

Copyright is for art. Patents are for utilities and tools.

The art in GPL code is in the arbitrary decisions made about how to structure that code… the class structure and not the algorithms.

You cannot copyright an algorithm and for very good reason. Think if Microsoft had the assumed powers granted by the GPL!

Microsoft is not training their code autocomplete on parts of GPL/MIT/etc code that is not covered by copyright. They are training it on all of the codebase.
What part of the codebase are the tools reproducing? The copyrightable aspects of software is generally at the structural level and not at the function level as most independent functions are utilitarian and not expressive in nature.

If these tools were not context dependent they would not be very useful. These tools aim to only reproduce the non-copyrightable aspects of code and in a context-aware manner.

I have yet to see a case where Copilot has returned code that is something other than the kind of functional, utilitarian code that is explicitly not covered by copyright.

Patents? Perhaps! But that’s another discussion.

If the purpose of processing copyrighted works is to learn the underlying structure and produce further works that are not independently derivative then the courts have a history of judging in favor of fair use.

Copyright is about artistic expression and not functionality.