Hacker News new | ask | show | jobs
by sircastor 936 days ago
For this reason, and a few others, my workplace simply put a blanket ban on these kinds of tools. If our code is never exposed to the learning tool, it’s never in danger of being showing up somewhere else.

Incidental to that, I feel like these tools expose the reality behind “copyrighting code/math” and how fallacious it is. If the tool can generate the efficient methods of achieving a result, I think it becomes obvious that one shouldn’t be able to protect it via IP law.

6 comments

Just like with social media, all it takes is one person to not honor that request, and boom! your shit is out there. Sure, you can fire the offending party, but you can't just ask Co-pilot to not use your contributions. That's like asking the internet to give those pictures back. It ain't gonna happen.
It's quite a different from the analogy you suggest, as copilot is controlled by a single organisation and we know the address.
I’m assuming you’re implying that a firewall rule can be applied to block access from the corp network. However, this is clearly ignoring the fact that work from home exists where the corp network can be bypassed.
If the tool can generate the efficient methods of achieving a result, I think it becomes obvious that one shouldn’t be able to protect it via IP law.

But these kinds of tools can only do that because someone else already put in the work to write the solutions that are used to train their models. Isn't this exactly the kind of situation when copyright is supposed to apply?

But with enough training data, it's not generating it because it remembers the exact code line for line, it does it because it knows that to be a good method. Especially if you ask it to refactor it, that's a whole new creation even if it's been done before by some engineer somewhere.
It's still parroting what other people did, it's not doing any math reasoning, and it's not any different to LLMs seemingly able to compose prose or poetry.

If you want to make an argument that math or software shouldn't be copyrighted, LLMs actually make the case for stronger copyright protections.

> If you want to make an argument that math or software shouldn't be copyrighted, LLMs actually make the case for stronger copyright protections.

Maybe, but as long as managers and shareholders all over the world are excited about the upside of the new technology, this is very unlikely to happen. ;)

LLMs would be dead in the water legally, if their owners had to account for every bit of IP the LLMs have been trained with.

If you use GitHub, you feed OpenAI with your code as training data already, with GitLab you do the same for Google.
> If you use GitHub, you feed OpenAI with your code as training data already, with GitLab you do the same for Google.

Do you have some evidence that github trained copilot on private repositories? They've been pretty clear about claiming they only used public repos.

Also, gitlab is not owned by Google AFAICT but is instead a publicly traded company.

Some people complained that CoPilot outputted their rare code almost verbatim so I have no reason to trust whatever GitHub/Microsoft state.
If you your going to make these kinds of accusations (the kind that if proved true would lead to multi-million dollar lawsuits), you should at least try to provide sources.
One example:

https://devclass.com/2022/10/17/github-copilot-under-fire-as...

Also there are multiple lawsuits already as you've probably noticed.

That is an examples of GitHub allegedly violating the license of public repositories.

The claim that was made was that GitHub trained using private repositories and I have yet to see any evidence.

Why don't you just look it up? It was on HN front page some time ago.
I tried, all I could find was one other HN comment asking about it. Admittedly I could have tried harder but it really isn't my assertion to defend.

I think there might be some confusion here between private repositories and public repositories with restrictive licenses. There is evidence of the latter but not the former.

If you use on-prem gitlab, presumably that is not the case.
Self-hosted LLM is really the only way to do this.
>If the tool can generate the efficient methods of achieving a result, I think it becomes obvious that one shouldn’t be able to protect it via IP law.

Why does that only hold when the result in question is in software? Machines are just tools for achieving results.

Because you can patent a machine. The argument is that software is "just math" (because it literally is just doing binary arithmetic) and mathematics cannot be patented.
Math should be patentable, too. I see no reason for why not.

The old argument that it's discovered rather than invented is bullshit. Multiple people can always have the same idea for an invention because we think alike and live in the same environment.

Or just ban patents altogether. Of course, this may discourage companies from investing in R&D and that's the real problem: how expensive is it to invent something, and does it justify a 20-year monopoly? But there are no good answers here, and trying to draw a line between math and non-math is bollocks.

There’s just something obscene about patenting mathematics. The universe gifts us these truths and our first instinct is that it should be the property of a human.

Patents exist to incentivize invention. As long as mathematicians are content to do mathematics for the love of it, and they certainly are, there’s no need for mathematical patents.

Practically speaking, mathematical ideas are building blocks not products. Patents on mathematical ideas discourage invention rather than encouraging it because they prevent use of that idea in new products - an idea that would have been discovered anyway. For example the parents of elliptic curve cryptography and arithmetic coding were hugely damaging to invention overall. Patenting a new kind of cork screw doesn’t have this problem, it’s a destination, not an intermediate.

The universe is not based on math, says math.

Math can be viewed as a product of how our minds work. We use abstractions to understand and predict the universe, but it's always imperfect, and the theories always incomplete.

E g., you'd think 1+1=2 is some universal truth, except integers don't exist in nature, being just another abstraction that we came up with. And of course, people can rediscover integers repeatedly, but that just says more about how our mind works.

And yes, math is a building block, but so is software. If math theories aren't patentable, that should happen based on them being trivial or perhaps being too useful to society, and not due to some romantic notions of discovery and the universe. Software, too.

>Practically speaking, mathematical ideas are building blocks

So are technological ones.

But a machine can also be mathematically described. Should that render it unpatentable, or will that have to wait until the grand unified theory of everything is sorted out?
You can self host LLMs you know