Hacker News new | ask | show | jobs
by dmoose 21 days ago
When did the first homo sapiens exist? Ideas like species evolve. Saying there are no original ideas seems to me an attempt to glibly capture something quite fundamental.
2 comments

Hi dmoose, your handle looks familiar to me. The non-glib answer is that we should giver some very serious consideration to the possibility that language either functions like, or possibly is the same as, Jung's collective unconscious: the organically created repository of all of humankind's cognition and reason, accumulated over vasts periods of time, deposited by billions of humans.

My way of "giving this serious attention" is through pre-registered, falsifiable, repeatable, experimentation, which anyone can look up on osf.io because I use my real name. I'll bet you that non of the randos in this thread do as much.

To all of the randos: unless you have data... it is just an opinion.

> unless you have data... it is just an opinion

Glib as well, but this one hits home a lot harder. Well said.

I don't disagree with your premise, but I'd argue that saying "there are no original ideas" in the context of a discussion of plagiarism is needlessly reductive. Even though I think I mostly agree with the author here, I think there are legitimate counterarguments that can be made; equating all of the ways someone can cite or build upon an idea with copying something word-for-word and claiming it's your own is not one of them though.
No offense, but you sound like someone who has never built a language model. Anyone who has actually built one understands that there is no copying going on. Just predicting words (tokens actually).

The problem is that people's words are MUCH more predictable then they would like to believe. And that truth upsets them.

In addition to having created models, I also write books and articles. Probably more than most people commenting here. I have a firm grip on what actual copyright law is and the pros and the cons of it.

> No offense, but you sound like someone who has never built a language model. Anyone who has actually built one understands that there is no copying going on. Just predicting words (tokens actually).

> The problem is that people's words are MUCH more predictable then they would like to believe. And that truth upsets them.

I'm not offended. I do think it's a little weird that you seem to think "training on a bunch of stuff that includes a set of words" and then "predicting" those words exactly is somehow okay because theoretically it might be extrapolating the exact same words from combining other ones. I'd argue that if a model trains on data, and then reproduces exactly a large subset of that data, the bar should be pretty high to prove that it's not copying, and "you don't understand because you didn't implement this" is not a good basis for law.

> In addition to having created models, I also write books and articles. Probably more than most people commenting here. I have a firm grip on what actual copyright law is and the pros and the cons of it.

I'm not convinced you have a firm grip on the idea that no matter how smart you may be, "just trust me bro" is a pretty terrible strategy if you're actually intending to convince anyone of anything. If that's not what your goal is here, it's not clear why it's worth your time to respond to other people's comments when you clearly have so many other productive ways to spend your time.