Hacker News new | ask | show | jobs
by butterfi 894 days ago
While I don’t necessarily agree with the NYT, I fail to see how or why LLMs are entitled to consume other peoples work for their own material gain.
2 comments

That's pretty much the entire point of many publications. You think readers of Financial Times aren't reading FT in the hopes of getting their own material gain? What about Wall St analysts? Consuming something for gain is not copyright infringement, distributing it for gain is.
The people who read the FT usually pay for it. Most of these LLMs are trained on a set of pirated content that they didn't pay for - https://shkspr.mobi/blog/2023/07/fruit-of-the-poisonous-llam...

Most copyrighted works will specifically say that the customer / user is prohibited from storing and reproducing those works.

Yet fair use can trump the owner's prohibitions. Your ISP can cache copyrighted materials, storing and reproducing them for other customers. Your browser stores the copyrighted images in your cache and 'reproduces' them if you browse the same page again.

It's a complicated area, not clear cut at all

If it’s illegal to make any material gain off skills learned through other people’s work, we’re all criminals.
Computers aren't humans.

I feel like I'm going to be saying a lot in the coming years, as more and more people's brains get broken by false anthropomorphization.

Maybe getting too off topic for the thread, but it feels like equating machine and human output reaches a level of nihilism even I shudder at. I think (hope) there is intrinsic value in something being made by a human being even if a machine could do comparable work 100x faster.
On this point, you and I agree.
Exactly this. If I read a blog summary of a paywalled article that enhances my knowledge and I use it to do my day job better, did I infringe on the original copyright?
If you regurgitate the paywalled article verbatim, as a service, for customers, then yes, you infringed. If you didn't, and you didn't build a system that has some probability of doing so, then no, you didn't. How is this so hard to understand.
Because it’s a hard problem! there are nuances to this complex problem that need to be thought through before reducing too much.

In this case, then, regurgitation is the problem then, not the fact that it was ‘read’.

If the models ensured that probability of regurgitation is near-zero, would that be ok?

If I had a gadget that might steal your life's savings, but assured you the probability was "near-zero", would you be ok with that?

Perhaps you personally would be fine with it. But would it be ok for a court declaring that someone has no recourse, and must accept such an uncompensated risk?