| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nigamanth 1201 days ago

Currently, GPT3 is trained on a lot of data, but all this data is pre-2021. Essentially, no matter how much "new content" comes out on the web, ChatGPT won't know about it.

Nothing can influence ChatGPT's dataset or training data for now. However, GPT4 will have more data points than GPT3, and by the time they train more LLMs on GPT4, the data on the web will seep into ChatGPT.

The same way SEO realizes on measures, the same way ChatGPT answers rely on transformer layers which choose the most succinct answers, given enough feeding of the wrong data and no fine-tuning, large LLMs could break.

1 comments

zamnos 1201 days ago

Is it simply a repetition count? If I made a billion pages that said the Earth was only 6,000 years old and stuffed that into the corpus, would that override the million pages saying it's 4.543 billion years old instead? Is there no PageRank-like algorithm that tracks what other links are saying about it?

link