Hacker News new | ask | show | jobs
by planb 1842 days ago
I get the feeling that now that it's so easy to generate text that sounds legit and is syntactically correct but makes no sense semantically, web search will be unusable soon.

Searching for answers to an obscure problem with a software library now already leads to 5 Stack Overflow ripoff sites that at least contain the same text like the original. If these scammers can now generate their own texts (randomly including all possible keywords I might use to narrow my search), we will drown in useless information.

And this isn't restricted to programming, this was just an example of a field that is full of scam already. When it gets cheap enough, somebody will do this with any topic that people are interested in...

I wonder how Google and other search engines will tackle this problem.

3 comments

Oh my god. I fed that text to the model and asked what we can do about this. Here are its answers:

It is not so much the search engines that are at risk, but the people themselves. In a world where anyone with a $0.05 budget can just make something up, nobody will be able to trust anything anymore. It is not the search engines that will change the way we search, it will be the people, who will stop trusting anything they read. And that is a problem.

The problem is not with search engines. It's with people. Most people who get scammed will not be able to understand the text they see. They may think it's legit because it looks pretty, but they don't understand what the text means. The only way to find out the truth is by doing a little research and contacting the company to ask for a refund or to talk to their support. But, with enough of these scams, people will stop trusting anything they read.

I really like the brief bit of optimism that got brokenly encoded into this response, with the idea that people can find out the truth by first getting refunds for the lies they've been told.
Begun, the AI war, has.
This does make me wonder, how long until we get a GPT that can drop pop-culture references smoothly enough to pass the Turing Test?

It's harder than other types of text generation, because part of the game of referring to memes is to do it in as roundabout a way as you can, using the kind of hidden inferences a neural network would probably struggle with.

But it might still be feasible soon. shudders

Yeah you wrote that it made no sense semantically but it does, it rewrote your exact argument. This is great for a non native speaker for example, to rewrite a paragraph in a more standard language. Plagiarism will definitely be harder/impossible to detect.
The last paragraph wasn't mine. It was also part of the output :)
Google uses a lot of other signals besides page content to rank search results. Common examples:

1. Links from well-ranked pages. 2. Whether the user hits the back button or stays on the page.

Many of these should still work.

But they obviously don't work! That's why I mentioned all those Stack Overflow rip off sites as an example. When scrolling the results, my brain filters them out because I can recognise them as scam, but if they would contain unique and relevant text (generated by an AI), I'd be lost.
I actually like the stack overflow rip off sites. They load quicker than the real stack overflow (which often seems to freeze due to broken ad scripts).
Google favors junk websites over Stack Overflow for the same reason it downranks all of its competitors.