Hacker News new | ask | show | jobs
by timewizard 526 days ago
> LLMs are not massive archives of data.

Neither am I, yet, I am still capable of reproducing copyrighted works to a level that most would describe as illegal.

> And before you knee-jerk "it's a compression algo!"

It's literally a fundamental part of the technology so I can't see how you call it a "knee jerk." It's lossy compression, the same way a JPEG might be, and simply recompressing your picture to a lower resolution does not at all obviate your copyright.

> I invite you to archive all your data with an LLMs "compression algo".

As long as we agree it is _my data_ and not yours.

1 comments

> It's lossy compression, the same way a JPEG might be

Compression yes, but this is co-mingling as well. The entire corpus is compressed together, which identifies common patterns, and in the model they are essentially now overlapping.

The original document is represented statistically in the final model, but you’ve lost the ability to extract it closely. Instead you gain the ability to generate something statistically similar to a large number of original documents that are related or are structurally similar.

I’m just commenting, not disputing any argument about fair use.