Hacker News new | ask | show | jobs
by jerf 1574 days ago
Yes, I think we're still a couple of years from this becoming an intractable problem, but it's absolutely coming.

Startup entrepreneurs in the mood for a Hail Mary play take note. How do you have a web search engine in a world where there no longer exists any algorithm for telling spam apart from real content? "Go back to the original Yahoo" is a decent start but certainly nowhere near a complete answer in 2022!

My guess is that it may not even take the form of what we have today, with an arbitrary text box. Maybe you have to go down to a specific category at least. Who knows. I sure don't. All I can say is that it sure looks to me like the spammers are only a year or two from effective total victory in the current paradigm.

2 comments

Just because people cannot tell generated content from natural content doesn't mean ML classifiers can't. Training GPT-3 to recognize GTP-3 is w lot simpler than you think (more specifically, we do t have a good way of sampling from the long tail when generations which a model like GPT-3 can pick up on in a jiffy), especially since vast majority of people won't be able to find tune the model enough to diverge from base statistics. Throw in other features like domain trustworthiness, user click rate, etc and search engines should remain fairly reliable for mainstream searches. If you are searching niche content though, yes, there could be degradation
Spammers aren't going to just "fire GPT-3" at the problem and then quail in panic when it doesn't quite work, any more than they have with any other technique.

The problem is the space between "AI generated" and "human generated" is fundamentally getting smaller. That's a real problem. We don't seem to be that many steps away from that space going to zero for generalized writing.

Reporting for duty...