| HN Mirror

> No offense, but you sound like someone who has never built a language model. Anyone who has actually built one understands that there is no copying going on. Just predicting words (tokens actually).

> The problem is that people's words are MUCH more predictable then they would like to believe. And that truth upsets them.

I'm not offended. I do think it's a little weird that you seem to think "training on a bunch of stuff that includes a set of words" and then "predicting" those words exactly is somehow okay because theoretically it might be extrapolating the exact same words from combining other ones. I'd argue that if a model trains on data, and then reproduces exactly a large subset of that data, the bar should be pretty high to prove that it's not copying, and "you don't understand because you didn't implement this" is not a good basis for law.

> In addition to having created models, I also write books and articles. Probably more than most people commenting here. I have a firm grip on what actual copyright law is and the pros and the cons of it.

I'm not convinced you have a firm grip on the idea that no matter how smart you may be, "just trust me bro" is a pretty terrible strategy if you're actually intending to convince anyone of anything. If that's not what your goal is here, it's not clear why it's worth your time to respond to other people's comments when you clearly have so many other productive ways to spend your time.