Hacker News new | ask | show | jobs
by wolvoleo 77 days ago
It's more ironic because without all the scraping openai has done, there would have been no ChatGPT.

Also, it's not just the cost of the bandwidth and processing. Information has value too. Otherwise they wouldn't bother scraping it in the first place. They compete directly with the websites featuring their training data and thus they are taking away value from them just as the bots do from ChatGPT.

In fact the more I think of it, I think it's exactly the same thing.

1 comments

This leads me to thinking: I ask chatGPT a question and they get the answer from gamefaqs.

But what happens if gamefaqs disappears because of lack of traffic?

Can LLM actually create or only regurgitate content.

>Can LLM actually create or only regurgitate content.

Contrary to what others say, LLMs can create content. If you have a private repo you can ask the LLM to look at it and answer questions based on that. You can also have it write extra code. Both of these are examples of something that did not exist before.

In terms of gamefaqs, I could theoretically see an LLM play a game and based on that write about the game. This is theoretical, because currently LLMs are nowhere near capable enough to play video games.

It will remain in their scraped data so they can keep including it in their later training datasets if they wish. However it won't be able to do live internet searches anymore. And it will not generate new content of course. Especially not based on games released after the site codes down so it doesn't know. Though it could of course correlate data from other sources that talk about the game in question.
They cannot create original content.
Well they can make some up, like hallucination. That's an additional problem: when the original site that provided the training data is gone: how can they use verify the AI output to make sure it's correct?