Hacker News new | ask | show | jobs
by maxdo 1 hour ago
you invest billions of dollars many months of work to just everyone distill your model?
4 comments

>be me

>anthropic

> mine the internet for data, blasting millions of blogs with scrapers

>a few have to shut down, but that's just the price to pay

>finally, the chatbot is ready

>learn that there are EVIL cretins out there trying to scrape automated output from OUR product to build their chatbot

>build in safeguards to new model to stop this

>the users are mad, now the model accuses users of being bioterrorists if they so much as mention they have a cold

>mfw

Seriously... the gaul of people just scraping a model for free data!
You wouldn't download an LLM for free, would you?
That might be an indication that the business is not sustainable because there is not any technical or practical differentiator besides scale. Harming your customers to maintain that differentiation isn't sustainable either.
any intellectual labor is not sustainable, if anyone can copy your data. why have microsoft, i you can just copy windows and run it?
Have you copied Windows and tried to run it? I would love to see the plain text source code that you claim to have. We all would.
half of the developing world did. guess what it stopped a bit the trend? protection.
There is a difference between being able to validate a Windows license and copying Windows from source code.

If we are talking about distillation vs building from scratch, none of these are congruent to Windows. I can build my own LLM [0] and then distill off of Claude, but that is not the same as a 1:1 copy of an operating system because there was the ability to crack how licensing works. We are not seeing Windows clones, at the source level, for that reason.

Also, Linux exists. Anyone can copy that. Why doesn't that count?

[0] https://huggingface.co/docs/transformers/quicktour

Did it really? Here in my <large 3rd world country> at least, afaik no one's stopped pirating. The tools to activate may have changed but haven't gone away.
It's the game. Because consumers reject it otherwise.

Why go to bat for anti-consumer behaviors unless you are a shareholder?

Their billions are not my problem; but the money I pay them and service I get in return, is. And if they can't provide, I will shop elsewhere (and do).

You invest billions of dollars in hosting and benefit from hundreds of millions of man hours of human output, just so everyone trains on "your" data?