| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by robbiep 762 days ago
	I’m not involved in the space, but it seems to me that having a model, in particular a massive model, exposed to a corpus of text like a book in the training data would have very minimal impact. I’m aware that people have been able to return data ‘out of the shadows’ pf the training data but to my mind a model being mildly influenced by the weights between different words in this text hardly constitute hard recall, if anything it now ‘knows’ a little of the linguistic style of the authour. How far off am I?

2 comments

int_19h 762 days ago

It depends on how many times it had seen that text during training. For example, GPT-4 can reproduce ayats from the Quran word for word in both Arabic and English. It can also reproduce the Navy SEAL copypasta complete with all the typos.

link

kaibee 762 days ago

Poe's "The Raven" also.

link

19h 762 days ago

Brothers in username.. :-)

link

Salgat 762 days ago

Remember, it's also trained on countless internet discussions and papers on the book.

link