Hacker News new | ask | show | jobs
by Isinlor 2154 days ago
> It's not clear to me that larger models will solve this in the limit; take the way GPT3 fails on addition past a certain number, and the fundamental inability for transformers to learn certain algorithms.

GPT-3 was OpenAI exercise in how far pure scaling can get you. They have used some 2 years old method. Already at the point when they started training GPT-3 there were readily available remedies to many of GPT-3 issues. Given how they energized the wider community I'm sure even more focus will be given to improving language models in the following years.

Some rough ideas right now:

- People think that cherry-picking the best GPT-3 examples is cheating - why? Train a model that will be selecting the best examples for you. My proposition is to train a model that guesses whether some text was GPT-3 generated or human made - select samples that look the most human like.

- Use a good search method to look for the best samples. Monte Carlo Tree Search? AlphaZero? MuZero? If MuZero can play a games of Chess, Shogi, Go and all of Atari then way should it not be able to play a game of what word will come next?

- Hook up the language model to a search engine. Instead of writing a whole program yourself, why not to copy-paste some stuff from StackOverflow with some slight modifications?

Etc.

It doesn't address the issues with agency, grounding and multi-modality, but it's a good road map for the next 2-3 years.

1 comments

train a model that guesses whether some text was GPT-3 generated or human made - select samples that look the most human like.

What you said is essentially: "Train a better GPT model". Humans have trouble distinguishing between (some of) GPT-3 and human writing. The only way to build a classifier that can do this is to build a model that is better than GPT-3 at understanding text. It would need to have features currently absent in GPT-3, such as common sense and understanding the world (e.g. causality, physics, psychology, history, etc). If what you say could be done, GPT-3 would have been designed as a GAN.

It's a lot easier to notice logical mistakes in already written text, than it is to avoid making them in the first place. When you write text do you write it in one pass or do you read yourself and fix mistakes, reformulate sentences etc.? I have reformulated this piece of text at least once in order to make my argument clear.

That's the difference between GPT and BERT. GPT can only attend to the past outputs, while BERT one can attend also to the future outputs.

Now imagine that what you are going to say is not actually determined by you, but it is sampled randomly from what seems like a reasonable thing to say. This is how GPT-3 works. If somebody ask you some kind of question you can guess 70% yes or 30% no, then roll a 10 side dice to pick one, but once you pick there is no way back.

And I already mentioned that it does not address agency, grounding and multi-modality, but it could improve GPT ability to formulate coherent arguments, follow instructions, write mathematical proofs and computer programs or play games.

BTW - I actually have implemented it and it works quite reasonably.

Here are samples from GPT-2 small and GPT-2 small + RoBERTa adversarial decoder.

https://github.com/Isinlor/AdvDecoder/tree/master/outputs

It's a lot easier to notice logical mistakes in already written text, than it is to avoid making them in the first place

For a human who does logical thinking, yes. But for a language model? I'm actually not sure, because it's possible that a sufficiently complex language model like GPT-3 does form some kind of general logical rules encoded in its weights somehow. This would be interesting to explore.

I actually have implemented it and it works quite reasonably.

Oh, so you are trying to design GPT-2 like a GAN, or at least move into that direction. Interesting. Yes, I don't see why not. What do you think about taking a step further, and actually making it a GAN, i.e propagating the error from discriminator into the encoder? I'm sure you're aware of multiple attempts to do this with smaller models, with mediocre results, but maybe GPT-3 scale is what needed to make it work?