Hacker News new | ask | show | jobs
by flir 236 days ago
> I mean how does it know that though?

Not a clue. But apparently it does. Try a few nonsense texts yourself, see if it rejects them.

I'm saying that if you're spidering the whole web, then training an LLM on that corpus, asking an existing LLM "does this page make sense?" is a comparatively small additional load.

> guess with high efficiency

Yes, I think that's basically what's happening. Markov nonsense is cheap to produce, but easy to classify. A more subtle strategy might be more successful (for example someone down-thread mentions using LLM-generated text, and we know that's quite a hard thing to classify).