Hacker News new | ask | show | jobs
by mrweasel 746 days ago
Someone could easily create a such a license. Free to use and distribute, $10,000 per line used for AI model training.

I'll very naively assume that Amazon, OpenAI, Google and others check licenses before feeding data to their models. I'll stop assuming that when one of these companies admit that they don't actually care and it's not profitable for them to respect licenses.

1 comments

To make that enforceable, it would be nice to prove the AI was trained on it.

You might insert a "sleeper/activator" pair. The sleeper is a watermark that the AI will recall verbatim. To make it provide the sleeper, we give the AI a special activator prompt.

Demonstrating that your public repo successfully poisoned the AI with a watermark could become a court admissible proof of unauthorized scraping.