Hacker News new | ask | show | jobs
by winety 1455 days ago
> There is a problem because some licenses require attribution, but ignoring that...

Surely the solution would be to give credit to every author from the training corpus. I am looking forward to the 10 000 lines of copyrights in every header. :P

If Microsoft had trained it on its own code, there would be no such problems. Surely a company as large as Microsoft has produced enough code over the years to create a large enough training dataset.

1 comments

> If Microsoft had trained it on its own code, there would be no such problems.

I keep seeing this sentiment from the GPL/"laundering" side of the debate.

Believe me, Microsoft wouldn't have released this thing (after what, 6 months of beta testing?) if they thought they had any "problems" at all.

I'm not saying I don't sort of agree with you, but is there no room for what's actually _likely_ to happen in this debate? Because as best as I can tell, they aren't going to see any real legal issues from this.

(There's also an option to remove generations that result in a collision with actual GitHub code, just fyi)

I feel like when the singularity happens HN is going to be flooded with programmers mad that they got automated away despite it very much being one of the primary goals of computer science and software engineering. This stuff is a kind of just a fact of life now.

Salesforce trained models (on GitHub) competitive with copilot without needing to own GitHub. I would spend less time worrying about how to lawyer up and more time figuring out how you're going to adapt to these new tools. That's the gig.

Microsoft made a bet that releasing Copilot will mean more profits than the legal issues might cost them. This doesn't mean anything if there is or isn't a problem with it.

The simply way to test the legal theory behind copilot would be to write a AI that write music notes, using music scraped from youtube or any other large music library. The idea that one can train on "public available material" and produce algorithms that output large chunk of copyrighted material is a bit untested in court, but go against the wrong target and we will quickly see a response. We have actually seen some traces of this with news bots that scrapes news site and produce "novel" interpretation of existing news, especially sports news.

This is what I'm talking about. Are we commenting on a news report of someone actually doing what you're describing - filling a suit or legal action of any kind against MS for this?

No, we're not. Further, Amazon just announced a similar product and Salesforce has literally _released weights_ for their code models. You can't put the genie back in the bottle.

Actually enforcing any action when the representations are learned rather than hard-coded just seems impossible to me. They have a check box that removes any predictions matching existing code - that basically makes it impossible to discern the source since this will be based on some subjective "semantic closeness" BS.

> Believe me, Microsoft wouldn't have released this thing (after what, 6 months of beta testing?) if they thought they had any "problems" at all.

They would. Did you just forget TAI? Microsoft didn't consider 4chan would train her to be the ultimate racist.

As you mentioned “WHEN the singularity happens” as an article of religious faith, followed by a vast leap of faith in proposing no-code tools taking over programming, I’m afraid that you adapting to the reality on the ground will be the difficult part here, rather than the lack of adaptation by programming writ large.

Do you work in marketing? Do you program?

I'm only somewhat certain that the singularity is inevitable (and obviously my predictions aren't worth betting on anyways) - sorry for using poetic language.

I'm a machine learning engineer, amateur researcher and open source contributor. Before that I was a software engineer for 8 years.