Hacker News new | ask | show | jobs
by VHRanger 217 days ago
> AI slop eventually will get as good as your average blogger. Even now if you put an effort into prompting and context building, you can achieve 100% human like results.

Hey, Kagi ML lead here.

For images/videos/sound, not at the current moment, diffusion and GANs leave visible artifacts. There's a bit of issues with edge cases like high resolution images that have been JPEG compressed to hell, but even with those the framing of AI images tends to be pretty consistent.

For human slop there's a bunch of detection methods that bypass human checks:

1. Within the category of "slop" the vast mass of it is low effort. The majority of text slop is default-settings chatGPT, which has a particular and recognizable wording and style.

2.Checking the source of the content instead of the content itself is generally a better signal.

For instance, is the author posting inhumanly often all of a sudden? Are they using particular wordpress page setups and plugins that are common with SEO spammers? What about inboud/outbound links to that page -- are they linked to by humans at all? Are they a random, new page doing a bunch of product reviews all of a sudden with amazon affiliate links?

Aggregating a bunch of partial signals like this is much better than just scoring the text itself on the LLM perplexity score, which is obviously not a robust strategy.

1 comments

> Are they using particular wordpress page setups and plugins that are common with SEO spammers?

Why doesn't Kagi go after these signals instead? Then you could easily catch a double digit percentage of slop and maybe over half of slop (AI generated or not), without having to do crowd sourcing and other complicated setups. It's right there in the code. The same with emojis in YouTube video titles.

You’re responding to the Kagi ML lead. They are using those signals in addition to crowd sourcing.
Are you certain? I haven't seen this mentioned anywhere, except for now. And lot's of SEO WordPress spam is still showing up in Kagi queries.
Yes, I'm the ML lead.

The current search engine doesn't go after WordPress plugins we consider correlated to bad pages.

By far the most efficient method in the search engine for spam is downranking by trackers/javascript weight/etc.

Slopstop is going after page formats but we didn't plan to scale that back to rankings for everyone quite yet, only use it as features to detect AI slop. Otherwise the collateral damage on good actors with bad websites would be risky early on.

> Yes, I'm the ML lead.

I never had any doubt about that ;)

What I was meaning with "are you certain" is regarding how Kagi treats the spam signals from WordPress plugins and themes. And now you gave the answer, thanks for that! I believe you will have good returns in using those signals.