| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pdxww 2569 days ago
	The model needs to be retrained from sctratch for different types of texts. One can release a model trained to generate Trump tweets, but it's of not much use for generating fake news on a specific topic.

3 comments

gwern 2568 days ago

Not in the least. It's quite easy to retrain, even for very different domains. Like my GPT-2 poetry: https://www.gwern.net/GPT-2 Or google around and look at all the things people have been retraining GPT-2 on, like https://www.reddit.com/r/SubSimulatorGPT2/

link

p1esk 2568 days ago

Can you please show us the best poetry example you generated? Does it rhyme?

link

gwern 2568 days ago

Most of the examples don't rhyme. It's unclear to me if this is because most of the original poetry doesn't rhyme so it's just faithfully replicating the lack of rhyme, or if it only partially and accidentally grasps the idea of rhyme.

As for the best one, I quote the ones that struck me during the training process, and some are highlighted in https://www.gwern.net/GPT-2#unconditional-samples

Some of the ones I like are 'We never say "Thank you"', 'Thy soul, thy very soul is burning!', '"It is morn!” said the clover-bush', 'And they have seen the last light fail', 'There comes a murmur low and sweet'.

Probably the best IMO is 'The sun is gone, and the night is late', but of course everyone will have a different favorite.

link

p1esk 2568 days ago

Yes, "The sun is gone..." starts out amazingly well. But later fixates on tides for some reason :)

Everything is generated by the 117M model, correct? If so, do you expect the quality to improve for larger models, or is there not enough poetry to train them on? I wonder how much of total poetry is contained in Gutenberg poetry corpus...

By the way, here's some poetry which has been generated by a Markov model: http://www.kurzweilcyberart.com/poetry/rkcp_poetry_samples.p...

link

gwern 2567 days ago

It's a mix of OA 117M and 345M at the moment. I haven't observed too much in the way of overfitting yet, so there should still be benefits to going up another 4.4x in model size to 1.5B. My guess is that at 1.5B, it'll start being more important to improve the poetry corpus, since you can already start to see problems with it - the Alexander Pope brokenness and the occasional prose generation of footnotes/commentary are definitely undesirable, and I suspect there would be less 'run on' effect in samples if the original corpus actually properly marked '<|endoftext|>' for each poem...

link

nl 2569 days ago

This is untrue. It's relatively easy to fine-tune a GPT model to a new domain.

link

solidasparagus 2569 days ago

Retrained from scratch? Why couldn't you just fine-tune the base model with Trumps tweets?

link

pdxww 2568 days ago

Maybe I don't understand something about these models. If the model was trained to mimic Trump tweets, it means that someone spent days of GPU time to find the weights of the model. Now if we want it to mimic HN comments, we'd need to spend the same amount of GPU time to find different weights. This is what I meant by "from scratch".

link

Reelin 2568 days ago

> ... if we want it to mimic HN comments, we'd need to spend the same amount of GPU time ...

These models are often much more general than you seem to be thinking. There's a base model which is incredibly computationally expensive to create from scratch. It is trained on a very large, very general set of data. Then there are specialized versions which are much cheaper to create - you start from the base model that you already have, and you train (much more briefly) on a specific set of data in order to tailor the output.

https://www.tensorflow.org/hub/tutorials/image_retraining

> Modern image recognition models have millions of parameters. Training them from scratch requires a lot of labeled training data and a lot of computing power (hundreds of GPU-hours or more). Transfer learning is a technique that shortcuts much of this by taking a piece of a model that has already been trained on a related task and reusing it in a new model.

link