Hacker News new | ask | show | jobs
by miki123211 77 days ago
A third way of looking at it is that you can't just blindly copy arguments when the situations are clearly different.

Nobody, not even Anthropic, is arguing that they should be able to host other people's paid content for free. The crux of their fair-use defense is that models are transformative works, just like parodies or book reviews, and hence should be treated as fair use.

You can't just take a pile of books (no pun intended) and turn that into Claude in a day with 30 lines of Python, there's a lot of work and know-how on the Anthropic side that goes into making a good LLM.

3 comments

anthropic argue that you should not use claude API to train your model

Situation A - Anthropic pays for a book - Anthropic transform the book into a new llm (transformative use) -> OK

Situation B - I pay for Anthropic API - I transform API responses into a new model (transformative use) -> Not OK

the situations, are clearly the same

Anthropic goes book->llm, you do llm->llm. Very different amounts of transformativeness.
this is the most honest argument for it. i respect that.

my impression is that if open models did 'distill' claude they made some interesting and productive ideas, like deepseek's more efficient attention

...idk...both transformations use transformers... thereby they both achieve adequate levels of "transformativeness" \s
If lossy-compressed transcodes of ripped movies are not "transformative works" and can get people even jailed, then lossy-compressed text of ripped books and websites is neither.

There is a lot of knowhow going into a good divx rip too, you know.

And it enables so much novel uses such as popcorn time, with fluorishing business opportunities.

You wouldn't download a car. They did.

It’s 200 lines of python
do you really believe that? Its not just the training run, its the whole infra around it as well
it's an exaggeration for sure but I don't think it's a stretch to believe Anthropic spends considerably more effort on data scraping & curation than anything else