Hacker News new | ask | show | jobs
by feelandcoffee 933 days ago
I wonder if this would become more common with things like ChatGPT.

Let's say you've been working in place A, you show your code to an LLM service (like the dozen or so Copilot-like services) and tell them to refactor. And for the sake of argument, let's say the LLM uses your code and questions for its next training dataset.

A few years pass, then you go to work at Place B, and ask a question that happens to be related to the problem that Place A's code solved, and they give you Place A's code as is.

2 comments

For this reason, and a few others, my workplace simply put a blanket ban on these kinds of tools. If our code is never exposed to the learning tool, it’s never in danger of being showing up somewhere else.

Incidental to that, I feel like these tools expose the reality behind “copyrighting code/math” and how fallacious it is. If the tool can generate the efficient methods of achieving a result, I think it becomes obvious that one shouldn’t be able to protect it via IP law.

Just like with social media, all it takes is one person to not honor that request, and boom! your shit is out there. Sure, you can fire the offending party, but you can't just ask Co-pilot to not use your contributions. That's like asking the internet to give those pictures back. It ain't gonna happen.
It's quite a different from the analogy you suggest, as copilot is controlled by a single organisation and we know the address.
I’m assuming you’re implying that a firewall rule can be applied to block access from the corp network. However, this is clearly ignoring the fact that work from home exists where the corp network can be bypassed.
If the tool can generate the efficient methods of achieving a result, I think it becomes obvious that one shouldn’t be able to protect it via IP law.

But these kinds of tools can only do that because someone else already put in the work to write the solutions that are used to train their models. Isn't this exactly the kind of situation when copyright is supposed to apply?

But with enough training data, it's not generating it because it remembers the exact code line for line, it does it because it knows that to be a good method. Especially if you ask it to refactor it, that's a whole new creation even if it's been done before by some engineer somewhere.
It's still parroting what other people did, it's not doing any math reasoning, and it's not any different to LLMs seemingly able to compose prose or poetry.

If you want to make an argument that math or software shouldn't be copyrighted, LLMs actually make the case for stronger copyright protections.

> If you want to make an argument that math or software shouldn't be copyrighted, LLMs actually make the case for stronger copyright protections.

Maybe, but as long as managers and shareholders all over the world are excited about the upside of the new technology, this is very unlikely to happen. ;)

LLMs would be dead in the water legally, if their owners had to account for every bit of IP the LLMs have been trained with.

If you use GitHub, you feed OpenAI with your code as training data already, with GitLab you do the same for Google.
> If you use GitHub, you feed OpenAI with your code as training data already, with GitLab you do the same for Google.

Do you have some evidence that github trained copilot on private repositories? They've been pretty clear about claiming they only used public repos.

Also, gitlab is not owned by Google AFAICT but is instead a publicly traded company.

Some people complained that CoPilot outputted their rare code almost verbatim so I have no reason to trust whatever GitHub/Microsoft state.
If you your going to make these kinds of accusations (the kind that if proved true would lead to multi-million dollar lawsuits), you should at least try to provide sources.
One example:

https://devclass.com/2022/10/17/github-copilot-under-fire-as...

Also there are multiple lawsuits already as you've probably noticed.

Why don't you just look it up? It was on HN front page some time ago.
If you use on-prem gitlab, presumably that is not the case.
Self-hosted LLM is really the only way to do this.
>If the tool can generate the efficient methods of achieving a result, I think it becomes obvious that one shouldn’t be able to protect it via IP law.

Why does that only hold when the result in question is in software? Machines are just tools for achieving results.

Because you can patent a machine. The argument is that software is "just math" (because it literally is just doing binary arithmetic) and mathematics cannot be patented.
Math should be patentable, too. I see no reason for why not.

The old argument that it's discovered rather than invented is bullshit. Multiple people can always have the same idea for an invention because we think alike and live in the same environment.

Or just ban patents altogether. Of course, this may discourage companies from investing in R&D and that's the real problem: how expensive is it to invent something, and does it justify a 20-year monopoly? But there are no good answers here, and trying to draw a line between math and non-math is bollocks.

There’s just something obscene about patenting mathematics. The universe gifts us these truths and our first instinct is that it should be the property of a human.

Patents exist to incentivize invention. As long as mathematicians are content to do mathematics for the love of it, and they certainly are, there’s no need for mathematical patents.

Practically speaking, mathematical ideas are building blocks not products. Patents on mathematical ideas discourage invention rather than encouraging it because they prevent use of that idea in new products - an idea that would have been discovered anyway. For example the parents of elliptic curve cryptography and arithmetic coding were hugely damaging to invention overall. Patenting a new kind of cork screw doesn’t have this problem, it’s a destination, not an intermediate.

The universe is not based on math, says math.

Math can be viewed as a product of how our minds work. We use abstractions to understand and predict the universe, but it's always imperfect, and the theories always incomplete.

E g., you'd think 1+1=2 is some universal truth, except integers don't exist in nature, being just another abstraction that we came up with. And of course, people can rediscover integers repeatedly, but that just says more about how our mind works.

And yes, math is a building block, but so is software. If math theories aren't patentable, that should happen based on them being trivial or perhaps being too useful to society, and not due to some romantic notions of discovery and the universe. Software, too.

>Practically speaking, mathematical ideas are building blocks

So are technological ones.

But a machine can also be mathematically described. Should that render it unpatentable, or will that have to wait until the grand unified theory of everything is sorted out?
You can self host LLMs you know
for this ChatGPT has a 'private' mode in which your conversation exists only while you keep it open. It's not used for training, an no human see it (presumably). The negative side is it disappears with no history, so you can't continue next day. That was introduced after complains similar to yours. Some companies put a total ban.
.
Whoops. I posted a comment, then changed my mind and disagreed with my premise. So I went to delete it, but somehow didn't. Sorry for the noise.