| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by laacz 217 days ago

Though I'm still pissed at Kagi about their collaboration with Yandex, this particular kind of fight against AI slop has always striked me as a bit of Don Quixote vs windmill.

AI slop eventually will get as good as your average blogger. Even now if you put an effort into prompting and context building, you can achieve 100% human like results.

I am terrified of AI generated content taking over and consuming search engines. But this tagging is more a fight against bad writing [by/with AI]. This is not solving the problem.

Yes, now it's possible somehow to distinguish AI slop from normal writing often times by just looking at it, but I am sure that there is a lot of content which is generated by AI but indistinguishable from one written by mere human.

Aso - are we 100% sure that we're not indirectly helping AI and people using it to slopify internet by helping them understand what is actually good slop and what is bad? :)

We're in for a lot of false positives as well.

5 comments

VHRanger 217 days ago

> AI slop eventually will get as good as your average blogger. Even now if you put an effort into prompting and context building, you can achieve 100% human like results.

Hey, Kagi ML lead here.

For images/videos/sound, not at the current moment, diffusion and GANs leave visible artifacts. There's a bit of issues with edge cases like high resolution images that have been JPEG compressed to hell, but even with those the framing of AI images tends to be pretty consistent.

For human slop there's a bunch of detection methods that bypass human checks:

1. Within the category of "slop" the vast mass of it is low effort. The majority of text slop is default-settings chatGPT, which has a particular and recognizable wording and style.

2.Checking the source of the content instead of the content itself is generally a better signal.

For instance, is the author posting inhumanly often all of a sudden? Are they using particular wordpress page setups and plugins that are common with SEO spammers? What about inboud/outbound links to that page -- are they linked to by humans at all? Are they a random, new page doing a bunch of product reviews all of a sudden with amazon affiliate links?

Aggregating a bunch of partial signals like this is much better than just scoring the text itself on the LLM perplexity score, which is obviously not a robust strategy.

carlosjobim 217 days ago

> Are they using particular wordpress page setups and plugins that are common with SEO spammers?

Why doesn't Kagi go after these signals instead? Then you could easily catch a double digit percentage of slop and maybe over half of slop (AI generated or not), without having to do crowd sourcing and other complicated setups. It's right there in the code. The same with emojis in YouTube video titles.

hananova 217 days ago

You’re responding to the Kagi ML lead. They are using those signals in addition to crowd sourcing.

carlosjobim 217 days ago

Are you certain? I haven't seen this mentioned anywhere, except for now. And lot's of SEO WordPress spam is still showing up in Kagi queries.

VHRanger 217 days ago

Yes, I'm the ML lead.

The current search engine doesn't go after WordPress plugins we consider correlated to bad pages.

By far the most efficient method in the search engine for spam is downranking by trackers/javascript weight/etc.

Slopstop is going after page formats but we didn't plan to scale that back to rankings for everyone quite yet, only use it as features to detect AI slop. Otherwise the collateral damage on good actors with bad websites would be risky early on.

carlosjobim 217 days ago

> Yes, I'm the ML lead.

I never had any doubt about that ;)

What I was meaning with "are you certain" is regarding how Kagi treats the spam signals from WordPress plugins and themes. And now you gave the answer, thanks for that! I believe you will have good returns in using those signals.

immibis 217 days ago

If you're concerned about money ending up at companies that are taxed by countries that mass murder people, you should be as pissed about Google, Microsoft, DuckDuckGo, Boeing, Airbus, Walmart, Nvidia, etc... there is almost no company you should not be pissed off by.

I would be happy that Google is getting some competition. It seems Yandex created a search engine that actually works, at least in some scenarios. It's known to be significantly less censored than Google, unless the Russian government cares about the topic you're searching for (which is why Kagi will never use it exclusively).

abnercoimbre 217 days ago

> Even now if you put an effort into prompting and context building, you can achieve 100% human like results.

Are we personally comfortable with such an approach? For example, if you discover your favorite blogger doing this.

umanwizard 217 days ago

> Are we personally comfortable with such an approach?

I am not, because it's anti-human. I am a human and therefore I care about the human perspective on things. I don't care if a robot is 100x better than a human at any task; I don't want to read its output.

Same reason I'd rather watch a human grandmaster play chess than Stockfish.

Marsymars 217 days ago

There are umpteenth such analogies. Watching the world's strongest man lift a heavy thing is interesting. Watching an average crane lift something 100x heavier is not.

sjs382 217 days ago

I generally side with those that think that it's rude to regurgitate something that's AI generated.

I think I am comfortable with some level of AI-sharing rudeness though, as long as it's sourced/disclosed.

I think it would be less rude if the prompt was shared along whatever was generated, though.

laacz 217 days ago

Should we care? It's a tool. If you can manage to make it look original, then what can we do about it? Eventually you won't be able to detect it.

ehnto 217 days ago

Objectively we should care because the content is not the whole value proposition of a blog post. The authenticity and trust of validity of the content comes from your connection to the human that made it.

I don't need to fact check a ride review from an author I trust, if they actually ride mountain bikes. An AI article about mountain bikes lacks that implicit trust and authenticity. The AI has never ridden a bike before.

Though that reminds me if an interaction with Claude AI, I was at the edge of its knowledge with a problem and I could tell because I had found the exact forum post it quoted. I asked if this command could brick my motherboard, and it said "It's worked on all the MSI boards I have tried it on." So I didn't run the command, mate you've never left your GPU world you definitely don't actually have that experience to back that claim.

cruffle_duffle 217 days ago

“It's worked on all the MSI boards I have tried it on.”

I love when they do that. It’s like a glitch in the matrix. It snaps you out of the illusion that these things are more than just a highly compressed form of internet text.

Marsymars 217 days ago

Haven't we given some AI agents access to potentially motherboard-bricking commands yet?

Brian_K_White 217 days ago

If your wife can't detect that you told your secretary to buy something nice, should she care?

cschep 217 days ago

This is an absurd comparison - you (presumably) made a commitment to your wife. There is no such commitment on a public blog?

SkyBelow 217 days ago

Is it that absurd?

We have many expectations in society which often aren't formalized into a stated commitment. Is it really unreasonable to have some commitment towards society to these less formally stated expectations? And is expecting communication presented as being human to human to actually be from a human unreasonable for such an expectation? I think not.

If you were to find out that the people replying to you were actually bots designed to keep you busy and engaged, feeling a bit betrayed by that seems entirely expected. Even though at no point did those people commit to you that they weren't bots.

Letting someone know they are engaging with a bot seems like basic respect, and I think society benefits from having such a level of basic respect for each other.

It is a bit like the spouse who says "well I never made a specific commitment that I would be the one picking the gift". I wouldn't like a society where the only commitments are those we formally agree to.

cschep 216 days ago

I do appreciate this side of the argument but.. do you think that the level/strength of a marriage commitment is worthy of comparison to walking by someone in public / riding the same subway as them randomly / visiting their blog?

They seem world's apart to me!

Vegenoid 217 days ago

There are many discussions of what sets apart a high trust society from a low trust society, and how a high trust society enables greater cooperation and positive risk taking collectively. Also about how the United States is currently descending into a low trust society.

"Random blog can do whatever they want and it's wrong of you to criticize them for anything because you didn't make a mutual commitment" is low-trust society behavior. I, and others, want there to be a social contract that it is frowned upon to violate. This social contract involves not being dishonest.

recursive 217 days ago

Norms of society.

I made no commitment that says I won't intensely stare at people on the street. But I just might be a jerk if I keep doing it.

"You're not wrong, Walter. you're just an asshole."

Brian_K_White 217 days ago

Illuminating that you think the illustrated problem has something to do with a commitment.

harimau777 217 days ago

We should care if it is lower in quality than something made by humans (e.g. less accurate, less insightful, less creative, etc.) but looks like human content. In that scenario, AI slop could easily flood out meaningful content.

yifanl 217 days ago

I am 100% comfortable with anybody who openly discloses that their words were written by a robot.

onion2k 217 days ago

I don't care one bit if the content is interesting, useful, and accurate.

The issue with AI slop isn't with how it's written. It's the fact that it's wrong, and that the author hasn't bothered to check it. If I read a post and find that it's nonsense I can guarantee that I won't be trusting that blog again. At some point there'll become a point where my belief in the accuracy of blogs in general is undermined to the point where I shift to only bothering with bloggers I already trust. That is when blogging dies, because new bloggers will find it impossible to find an audience (assuming people think as I do, which is a big assumption to be fair.)

AI has the power to completely undo all trust people have in content that's published online, and do even more damage than advertising, reviews, and spam have already done. Guarding against that is probably worthwhile.

immibis 217 days ago

Even if it's right there's also the factor of: why did you use a machine to make your writing longer just to waste my time? If the output is just as good as the input, but the input is shorter, why not show me the input.

sjs382 217 days ago

> AI slop eventually will get as good as your average blogger. Even now if you put an effort into prompting and context building, you can achieve 100% human like results.

In that case, I don't think I consider it "AI slop"—it's "AI something else". If you think everything generated by AI is slop (I won't argue that point), you don't really need the "slop" descriptor.

laacz 217 days ago

Then the fight Kagi is proposing is against bad AI content, not AI content per-se? Then that's very subjective...

sjs382 217 days ago

I don't pretend to speak for them, but I'm OK in principle dealing in non-absolutes.

Thrymr 217 days ago

Explicitly in the article, one of the headings is "AI slop is deceptive or low-value AI-generated content, created to manipulate ranking or attention rather than help the reader."

So yes, they are proposing marking bad AI content (from the user's perspective), not all AI-generated content.

laacz 217 days ago

Which troubles me a bit, as 'bad' does not have same definition for everyone.

Thrymr 217 days ago

How is this any different from a search engine choosing how to rank any other content, including penalizing SEO spam? I may not agree with all of their priorities, but I would welcome the search engine filtering out low quality, low effort spam for me.

feedyourhead 216 days ago

Yes, that's why we'll publish a blog post on this subject in the coming weeks. We've been working on this topic since the beginning of summer, and right now our focus is on exploring report patterns.

Matt also shared insights about the other signals we use for this evaluation here https://news.ycombinator.com/item?id=45920720

And we are still exploring other factors,

1/ is the reported content ai-generated?

2/ is most content in that domain ai-generated (+ other domain-level signals) ==> we are here

3/ is it unreviewed? (no human accountability, no sources, ...)

4/ is it mindlessly produced? (objective errors, wrong information, poor judgement, ...)

SllX 217 days ago

There’s a whole genre of websites out there that are a ToC and a series of ChatGPT responses.

I take it to mean they’re targeting that shit specifically and anything else that becomes similarly prevalent and a plague upon search results.

harimau777 217 days ago

A simple definition would be: Its bad if it isn't labeled as AI content or if there is not a mechanism that allows you to filter out AI content.

sjs382 217 days ago

That's fine.

JumpCrisscross 217 days ago

> AI slop eventually will get as good as your average blogger

At that point, the context changes. We're not there yet.

Once we reach that point––if we reach it––it's valuable to know who is repeating thoughts I can get for pennies from a language model and who is originally thinking.