Hacker News new | ask | show | jobs
by pas 2682 days ago
But ... it's not novel. We could already generate convincing gibberish years ago.

Now the novelty is that this can be better targeted. But even simple Markov-chain based text generators were good enough to fool people for a bit.

And there was always people that had too much free time to write. A lot. (See for example the crackpots and conspiracy theorists that bombard physics forums. See the 9/11, Zeitgeists, etc. movies. See how much has been written about anti-vaxx, about quantum woo, etc.)

Reputation systems work pretty well for countering spammers.

And against APTs (advanced persistent threats, spearfishing attacks, etc) there's no real "universal" protection anyways. (You need a competent security team to out think and out resource the attackers in every possible dimension.)

This AI is the same as the paid Russian trolls and the unpaid scammers, and so on.

3 comments

The OpenAI samples are leaps and bounds ahead of traditional Markov-chain generated text. I don't think you can compare the two. It's the fluency and plausibility that gives pause around a public release.

I agree with your last point though - it falls into the same category as paid Russian trolls. I think that's exactly why they were hesitant to release the pre-trained models - they didn't want to make it easier/cheaper for a bad actor to replicate the 2016 election.

It remains to be seen whether their decision will make an iota of a difference. But I understand their motivation.

But ... it's not novel.

I work in this field, and yes, this is very novel (at least in terms of the quality).

It's the biggest improvement in quality I've ever seen. The long term coherence is so much better than anything else that has ever been built.

No, I'm sorry, I wasn't precise enough. Yes, it's an amazing feat of engineering, and a truly great peak of text generation. But it's that. Text generation.

Yes, it can serve as great customized propaganda generator, and yes, people can be spin 'round and 'round with it. But they can be already with pretty much anything, from the simplest of phrases from "make X great again" to the elaborate scams of new age bullshit.

I simply disagree on the "virulence" or weaponization factor of this with others. (Especially when it comes to the possible "defenses", none can be "deployed" in 6 months. You can't teach critical thinking to billions of people overnight.)

I've worked in the computational propaganda field, and I tend to agree that there is no real known defense yet.

I don't have a strong opinion about if they should have released this model or not.

I do know it would make a great commercial spam generator though. Want a million product reviews which seem legitimate quickly? This is the thing..

Markov-chain generators are extremely lacking in long-term coherency. They rarely even make complete sentences, much less stay on topic! They were not convincing at all-- and many of the GPT-2 samples are as "human-like" as average internet comments.

Conjecture: GPT-2 trained on reddit comments could pass a "comment turing test", where the average person couldn't distinguish whether a comment is bot or human with better than, say, 60% accuracy.

That's an indictment of reddit comments more than AI. Remember that conditioned on the human-provided seed prompt, there is no statistical surprise (the definition of information) in the generated text. If all reddit comments are are riffs on the OP based on second-hand information, well then they may as well be bot-generated already.

At this stage, these AI's can only help. Imagine we are given this tool that can generate samples from the "uninformative but realistic looking text" distribution, we can then put it in a discriminator to filter out blabbering bots and humans together, or invert it to summarize the small kernel of information, and that would be a great thing. The better these models learn about typical human behavior the better off we are at identifying the truly exceptional. It's when AI starts to sense and incorporate novel information from the non-human environment that you really have to worry.

>That's an indictment of reddit comments more than AI.

Perhaps, but that's the world we live in. I suspect the average reddit commenter is already more articulate than the average person (citation needed, I know. But reddit skews highly educated young male in a first-world country. There's no way they do worse than a worldwide average).

Other than that, I agree with your comment.

I know they are extremely lacking, but compared to that a hyper-fancy NN with layers and layers of the darkest of black magic, trained at the zenith of the night for thousands of man years in the crypts of the terror itself, the TPU ... yeah, so it's not surprising it's better.

But it's no symbolic reasoning. It's not constructing a counter-argument from your argument. It simply lives off previous epic rap battles of internet flamewar history about .. well, about anything, since it's the Internet, and people like to chat, talk, write essays on every topic there is. Satire too. So there is always something to build that lang model on.

Though that will come too. Eventually.