|
|
|
|
|
by eslaught
2682 days ago
|
|
> - I suggest going over some of the samples generated by the model. Many people react quite strongly, e.g., https://twitter.com/justkelly_ok/status/1096111155469180928. Have you done a plagiarism search on that text to see how similar it is to the input corpus? I'm by no means an ML expert, but I've played around with models for random name generation and one thing I've noticed is that as the models become more accurate, they also become much more likely to just regurgitate existing names verbatim. So if you search the list of names and notice something that seems particularly realistic, it could be because it's literally taken in whole or in part from the training data set! |
|
(The talking unicorn example on their page is also meant to demonstrate that, no, it's not just memorizing, but I think it's a bit more compelling to check from the raw samples)