Hacker News new | ask | show | jobs
by Borealid 374 days ago
Classifying a behaviour into either "dangerous" or "not dangerous" is a perfect example of non-generative AI (what was previously called Machine Learning). The output isn't intended to be a textual description, it's a binary yes/no.

You can use an LLM to do that, but a specific ML model trained on the same dataset would likely be better in every quantitative metric and that tech was available long before transformers stepped onto the stage.

4 comments

And the easiest way to train such a specific ML model today is to take an LLM and use it to generate various examples of subversive content to train on.

However, I wouldn't be so sure that an LLM with CoT would be less effective at this than a specially-trained ML model.

Further, given that a sufficiently advanced model of this nature necessarily has to understand the meaning of human text, including context and subtleties, you'd probably want to take an LLM as a basis for training any such model (just as e.g. text embedding models these days are often specialized LLMs for similar reasons).

In any case a realistic deployment at scale would employ multiple levels - starting with really simple classification models that are very fast and broadly low-precision (but trained to err on the side of flagging content). Any content that is flagged by that would be fed into larger models, and so on. At the top of this chain you would likely have SOTA LLMs doing very detailed reviews of the few bits of data that get flagged by all the levels below.

An LLM is needed to rationalize each unique classifications en masse, and write the warrants.
Ai can also nudge content choices towards autarch-sanctioned beliefs without the viewer being aware of it.

This has been happening for decades already. But AI can make it personal in a way that mass media can't.

Combine it with the kinds of psychological triggers and manipulations used in PR and advertising and you can convert almost anyone. You don't even need violence - just repetition.

This has already happened, btw. The Q phenomenon successfully radicalised entire demographics through careful use of emotional triggers and techniques to enhance suggestibility and addictiveness.

Seems unlikely to me. Would be really creepy to see a chart comparing the accuracy of both methods.

Are there any Natural Language Processing fields today that openly boast about higher performance than LLMs with experimental results? If there was they'd probably be in benchmarks.

The difference is you need a lot of training data to do that. Instead, now you can just tweak a system prompt and adapt it to whatever new policy you want to implement.