| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by int_19h 374 days ago

And the easiest way to train such a specific ML model today is to take an LLM and use it to generate various examples of subversive content to train on.

However, I wouldn't be so sure that an LLM with CoT would be less effective at this than a specially-trained ML model.

Further, given that a sufficiently advanced model of this nature necessarily has to understand the meaning of human text, including context and subtleties, you'd probably want to take an LLM as a basis for training any such model (just as e.g. text embedding models these days are often specialized LLMs for similar reasons).

In any case a realistic deployment at scale would employ multiple levels - starting with really simple classification models that are very fast and broadly low-precision (but trained to err on the side of flagging content). Any content that is flagged by that would be fed into larger models, and so on. At the top of this chain you would likely have SOTA LLMs doing very detailed reviews of the few bits of data that get flagged by all the levels below.