| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by scottlegrand2 2434 days ago

The funny thing for me is that if The Economist had just hacked the input into GPT-2 so that it was an ongoing conversation they would have found that it's OK at holding a conversation, better than I expected when I did so.

The conversation that I posted at:

https://medium.com/@scottlegrand/my-interview-with-the-world...

is performed in one shot. And I think it shows both the abilities and the limitations of GPT-2 and similar models. I am 100% role playing with the language model and prompting it to go in the directions it goes, but it surprised me several times, and eventually it all fell apart because I didn't perform any transfer learning on the model I just used the raw GPT-2 XL model to measure whether further work would be worth the effort and I would conclude yes it would be.

The first thing I need to do is dramatically increase the length of its input context. It's pretty good at running with an ensuing script because I suspect much of its training data was formatted that way. But since I ran out of context symbols, it eventually suffers from several incidents of amnesia and eventually effective multiple personality disorder. It also contradicts itself several times but then no more than the typical thought leader or politician does IMO.

What The Economist did was effectively erase the thing's memory between questions. So they were starting fresh with each question. And I think that's why they had to do the best of 5.