| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by fl4tul4 488 days ago

I do love competition.

In the last weeks are are seeing a torrent of advances, just because someone opened their architectures.

Imagine where we could go if the training datasets were also publicly available and unbounded by any copyright laws. (I'm not talking about doing anything illegal).

I can only dream, I guess.

4 comments

Lucasoato 488 days ago

A torrent of advances is the right way to word it, especially after it has been discovered what Meta trained their models on :)

link

paper2d 488 days ago

Those training datasets can never be free as almost all of them is copyrighted.

link

landryraccoon 488 days ago

Japan has said AI can train on copyrighted materials.

https://www.privacyworld.blog/2024/03/japans-new-draft-guide...

I imagine if copyright is a big issue for AI, Japanese startups will have an advantage.

link

0xdeadbeefbabe 488 days ago

Does China need to say anything or can you guess their policy?

link

chii 488 days ago

perhaps copyright needs to be updated. And in any case, my personal belief is that training on data that is publicly released, and as well as purchased media, is fair use.

link

philipwhiuk 488 days ago

If anything it needs to be updated to actually prevent the rampant profit extraction from human creation in order to protect actual creators.

link

FergusArgyll 488 days ago

Not OP, but that should be part of the update, I think.

I think we can all agree there does need to be an update. You don't want to forever outlaw deep learning (even if you do want to, that's not going to happen so it's worth helping to shape the future)

It's very complicated with a bunch of moving parts but I really want society to start arguing about it so we can get to a semi-fair place

link

realusername 487 days ago

I don't see how any of these authors loses money when you use chatgpt, even in theory.

You weren't going to buy a book instead of asking a question.

link

chii 487 days ago

The people who propose that authors lose money by chatGPT's usage of their works in the training, is the same idea that piracy costs music labels money.

link

realusername 486 days ago

And we know that piracy costing money is a bogus idea from research.

LLMs costing money makes even less sense as you can't get back the source material

link

woah 488 days ago

Each time someone clicks "send" on chatGPT, Warner Bros gets 1c

$25 to Elsevier per GPU purchase

link

eikenberry 487 days ago

I don't think you will ever see any law to benefit the creators. Better to eliminate it and at least let the artists the freedom to work with any media they want. Artists will generally still be poor, but they'll be more creative.

link

anigbrowl 487 days ago

Creativity and productivity are two completely different things.

link

spookie 488 days ago

I'll be honest, even if this comment won't fly: It is impossible to change the views here, on this point. Specifically, here.

I do share your opinion. Others may argue "What about x country? They don't care!", even though that position is about as good as making anything excusable because someone else did it.

I might add, I'm really not trying to be toxic. Just saying this based on what I see when this comes up.

link

CamperBob2 488 days ago

Yeah, that's a good idea. Stop the most important advance in storing, retrieving, and disseminating knowledge since the printing press because muh copyright!!1!!

Never mind that you've just handed control of an incredibly-powerful tool over to nations that DGAF about copyright law.

If copyright interests want to fight AI, then copyright has to go. It's that simple. It's an unnecessary fight, but somebody needs to convince them of that.

link

tonyedgecombe 488 days ago

The UK government is doing that at the behest of the AI companies which tends to indicate they have bet misbehaving up to now.

link

azinman2 488 days ago

Why should it be? I’d personally be pissed if my book, which came from my own hard work and is sold per person, all of the sudden get subsumed by a general AI. Even worse if it is commercialized and I get nothing for it.

link

chii 487 days ago

what if a classroom of students learnt from your book, and ended up with a high paying job, innovation, or production, none of which makes any profit for you as an author of said book (except for the copy sold to the student)?

link

azinman2 486 days ago

That’s perfectly in line with the common role and understanding of books.

link

taosx 488 days ago

Share the non-copyrighted ones and it's still a win if you make it possible to people to contribute, both through PRs, testing and discussion.

link

lionkor 488 days ago

almost all free things are copyrighted

link

Kye 488 days ago

It seems like the torrent was already happening and DeepSeek's part is just one example of that. They did help bring attention to those advancements, and that's led to lots more people contributing and finding more niche applications.

link

noduerme 488 days ago

Isn't the general attitude these days to just break laws and bribe officials once you own the hottest startup? /s

edit: re. the /s I was living offshore and running the most popular bitcoin casino at the time, spending a vast amount of money and energy to block any player who might be American. As a result I didn't make that much money. And I tried to calculate how much I would need to make if I wanted to break the law and hide out forever. I figured I could make $10-15M a year but that wouldn't be enough to hide. I fucked up, I guess. Because the richest man in the world made most of his first round of money facilitating gambling transactions, and he's now got his snout in every federal agency. I should have had the balls, I guess, to ask forgiveness rather than permission.

link

coliveira 488 days ago

This was always like this. Youtube started publishing mostly copyrighted content, then Google settled with copyright owners. Google by the way has perfected the "art" of training their algos with content without approval from copyright owners.

link