Hacker News new | ask | show | jobs
by darkkindness 2336 days ago
It's really weird to evaluate GPT-2 based on its ability to say things no reasonable person would ever say. If I were born in Cleveland I wouldn't be jumping to proclaim my fluency in English. If I told you I left my keys out at the pub, I wouldn't immediately repeat myself and say that my keys are now at the pub. If I'm talking about two trophies plus another trophy, I'd probably try to end it with some punchline rather than saying there's three trophies.

A lot of the things we write assume the reader can make connections on their own. That's a writing skill. It's the reason why Hemingway's famous "For sale: baby shoes, never worn" is so impactful. As such I've found GPT-2 to be incredible at writing fanfiction.

3 comments

The second thing I tried was:

"The square root of..."

I'm sure I've started sentences that way many many times. The results are pretty funny:

"The square root of four (e.g. 1.6 or 1.18) is 1,913,511."

The real immediate value of GPT-2 is human/computer collaboration. Think code completion or completion/prompts in other mediums, such as writing. Many art forms work - music / game design / painting / etc.
Thank you for saying this. This is something so many people miss when trying test the limitations of GPT-2. It just doesn't make sense to test it on strings of text that nobody ever writes.
Just for fun and to make a point, I threw your reply into Talk to Transformer.

> This is something so many people miss when trying test the limitations of GPT-2. It just doesn't make sense to test it on strings of text that nobody ever writes. To me, the best way to evaluate the usefulness of GPT-2 is to compare it to some actual test that validates a lot of its claims. So... let's do just that.

It might be just chance, but gee -- is this text referring to its own generation as a test to convey a point? The self-referentiality is formidable.