| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by eslaught 2682 days ago
	> - I suggest going over some of the samples generated by the model. Many people react quite strongly, e.g., https://twitter.com/justkelly_ok/status/1096111155469180928. Have you done a plagiarism search on that text to see how similar it is to the input corpus? I'm by no means an ML expert, but I've played around with models for random name generation and one thing I've noticed is that as the models become more accurate, they also become much more likely to just regurgitate existing names verbatim. So if you search the list of names and notice something that seems particularly realistic, it could be because it's literally taken in whole or in part from the training data set!

1 comments

czr 2682 days ago

You're welcome to check out the samples [https://raw.githubusercontent.com/openai/gpt-2/master/gpt2-s...] and evaluate them for memorization yourself (I haven't found any so far).

(The talking unicorn example on their page is also meant to demonstrate that, no, it's not just memorizing, but I think it's a bit more compelling to check from the raw samples)

link