| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by heyitsguay 459 days ago

There are ways to combat it -- LLM-generated text leaves statistical fingerprints that appear to endure across big foundation model generations.

I'm working on Binoculars with some UMD and CMU folks and wanted to test it out on this. I downloaded one bot's comment history (/u/markusrorscht). 30% of the comments rated human-like, compared to 95-100% of comments from a few human users.

So, practically speaking, statistical methods are still able to provide a fingerprinting method, and one that gets better as comment history gets longer. And they can be combined with other bot detection methods. IMO bot detection will stay a cat-and-mouse game, rather than (LLM-powered) bots winning the whole thing.

1 comments

butlike 459 days ago

Interesting-- thanks for the insight!

link