Hacker News new | ask | show | jobs
by dheera 1819 days ago
I suppose it really depends on if they spit out verbatim reproductions of code or whether it is the equivalent of a 10-year experienced programmer who has just seen a lot of code but isn't reproducing anything verbatim.

We shall see, by Googling some of the code it spits out.

FWIW GPT-3 doesn't really tend to spit out verbatim reproductions of copyrighted books.

4 comments

Copilot spitting out fast inverse square root, verbatim: https://twitter.com/mitsuhiko/status/1410886329924194309

HN discussion: https://news.ycombinator.com/item?id=27710287

I have worked a bit with transformers, the model underlying GPT. They absolutely learn to copy training data, and that’s perfectly normal.

What is happening here is we’re running into exactly what modern ML is NOT capable of: deductive reasoning. It does not think “I need to query the Twitter API for some posts, then filter them. Right, the API works like this…” No. It doesn’t think at all. It is a regression machine. “This sequence begins/looks like something I have seen before, here’s the corresponding output modulo adaptations.”

ML does not self-reflect, question motives and analyse causes. It’s just a complete lie to suggest otherwise, and to call this “pair programming”? What an absolute joke. It’s a lot like Tesla calling its glorified lane keeping an autopilot.

> FWIW GPT-3 doesn't really tend to spit out verbatim reproductions of copyrighted books.

But it does spit out whole paragraphs at a time. This is easy to test by going to any of the GPT-2/3 playgrounds on-line (e.g. AI Dungeon), and playing with prompts. Very specific prompts work best, but sometimes even with a generic prompt, if you let the model continue on its own past the first output, it might just shunt itself into a path where following the most probable continuations happens to reproduce a substantial portion of some work verbatim.