Hacker News new | ask | show | jobs
by p1esk 2154 days ago
train a model that guesses whether some text was GPT-3 generated or human made - select samples that look the most human like.

What you said is essentially: "Train a better GPT model". Humans have trouble distinguishing between (some of) GPT-3 and human writing. The only way to build a classifier that can do this is to build a model that is better than GPT-3 at understanding text. It would need to have features currently absent in GPT-3, such as common sense and understanding the world (e.g. causality, physics, psychology, history, etc). If what you say could be done, GPT-3 would have been designed as a GAN.

1 comments

It's a lot easier to notice logical mistakes in already written text, than it is to avoid making them in the first place. When you write text do you write it in one pass or do you read yourself and fix mistakes, reformulate sentences etc.? I have reformulated this piece of text at least once in order to make my argument clear.

That's the difference between GPT and BERT. GPT can only attend to the past outputs, while BERT one can attend also to the future outputs.

Now imagine that what you are going to say is not actually determined by you, but it is sampled randomly from what seems like a reasonable thing to say. This is how GPT-3 works. If somebody ask you some kind of question you can guess 70% yes or 30% no, then roll a 10 side dice to pick one, but once you pick there is no way back.

And I already mentioned that it does not address agency, grounding and multi-modality, but it could improve GPT ability to formulate coherent arguments, follow instructions, write mathematical proofs and computer programs or play games.

BTW - I actually have implemented it and it works quite reasonably.

Here are samples from GPT-2 small and GPT-2 small + RoBERTa adversarial decoder.

https://github.com/Isinlor/AdvDecoder/tree/master/outputs

It's a lot easier to notice logical mistakes in already written text, than it is to avoid making them in the first place

For a human who does logical thinking, yes. But for a language model? I'm actually not sure, because it's possible that a sufficiently complex language model like GPT-3 does form some kind of general logical rules encoded in its weights somehow. This would be interesting to explore.

I actually have implemented it and it works quite reasonably.

Oh, so you are trying to design GPT-2 like a GAN, or at least move into that direction. Interesting. Yes, I don't see why not. What do you think about taking a step further, and actually making it a GAN, i.e propagating the error from discriminator into the encoder? I'm sure you're aware of multiple attempts to do this with smaller models, with mediocre results, but maybe GPT-3 scale is what needed to make it work?