Hacker News new | ask | show | jobs
by davidclark 384 days ago
From a recent blog shared on HN, among a list of reasons they can’t or won’t use LLMs:

>The training data for LLMs is stolen. I don’t mean like “pirated” in the sense where someone illicitly shares a copy they obtained legitimately; I mean their scrapers are ignoring both norms and laws to obtain copies under false pretenses, destroying other people’s infrastructure. [footnotes omitted]

“I think I'm done thinking about GenAI for now” https://news.ycombinator.com/item?id=44193018

1 comments

I do agree that the scraping is annoying and I had to set up some anti scraping measures on one of my image heavy sites. However I am for the freedom of data especially if the models are open source. I only use local models and haven't used Claude or ChatGPT since last year and it's pretty awesome.