Hacker News new | ask | show | jobs
by cdme 811 days ago
Google surfaces data — or it used to — LLMs and AI companies actively exploit it with zero benefit given to creators or users of the platforms they're now cannibalizing.
1 comments

the irony. im surprised how businesses built on selling google search results is allowed to exist. i guess for the same reason google scraping the internet and building a product on top of it is allowed.

then it only makes sense scraped AI training data is also going to be tolerated because you would need to reproduce a large language model like ChatGPT using your copyrighted content can produce a similar derivative of your copyrighted content by doing forensic analysis.

its such an uphill battle for copyright holders. They need to replicate: copyrighted input ---> LM similar to ChatGPT4 ---> copyrighted output

So far its not looking good for OpenAI because its possible to generate copyrighted output (type spiderman in czech) so all that remains is demonstrating the middle layer (training it on LM similar to ChatGPT4) but that is unrealistically expensive.

I have theory that all this money spent on large models is to make it impossible for discovery (as it would require access to $100 billion GPUs)

The whole notion that AI can replace search is nonsense. It yields no benefit to the creators of the results it scrapes and the models hallucinate. It's worse for users and it's worse for everyone producing anything of note online.
but many chatgpt users are not using Google as much instead relying on LLMs + RAG

ChatGPT is the new search engine and provides far more value to the end user than Google.

The issue seems to be people want a payout from OpenAI...but its non-profit

It's a shiny toy — it'll yield worse answers. Much like Google's own AI.
Google search is terrible. Chatgpt is definitively better for searching right now, and i often find myself reaching for it over google for a wide category of questions.
Google search is terrible because Google's stopped caring about search quality in favor of monetization. It doesn't mean an LLM can outperform a traditional search engine that cares about said quality.