Hacker News new | ask | show | jobs
by bugglebeetle 672 days ago
You don’t find the stealing argument logically sound because you immediately frame the theft as “analyzing” to suit your own narrative and then demand people engage with it, while proceeding to make further spurious claims like…

> I would be more convinced if AI used a fixed database during generation

Wow, I didn’t know that model weights, an elaborately compressed form of their training data, rewrote themselves every time they were invoked. Or that it’s only theft if I stole data from a fixed database to build my own service.

1 comments

AI training is literally analyzing. That is how it works. Properly trained models (i.e., ones that aren't overparameterized or overfit) do not just "elaborately compress" training data as this is not possible. For example, you cannot compress 1 billion images into 1 billion parameters, and expect to retrieve them later.

If objective facts are "my own narrative", then no rational discussion can occur.

Oh well, you should tell the folks at DeepMind and Meta about these objective facts then so they don’t waste any more time doing research:

https://arxiv.org/html/2309.10668v2

Maybe apply for a job there too, since you’re obviously so far ahead of everyone in understanding this problem space.

You absolutely can compress a subset of a billion images into a billion parameters if you throw out all but a thousand. Is it no longer copyright infringement if you also run enough irrelevant data through your algorithm alongside the images you’re stealing?