Hacker News new | ask | show | jobs
by wongarsu 4 days ago
But this knowledge is readily accessible today. At least for manhattan-project level bombs. For later developments you mostly get simplified overviews with important details left out. But even there you have communities speculating about this very publicly

The same is true for adjacent topics. Most LLMs will refuse to tell you how to make dynamite, youtube demonetises any videos about it, but it's right there in the wikipedia articles on dynamite and nitroglycerine

2 comments

I think they said increasing false positives because it would make it easier to generate at a mass scale. IDK the merits of the argument or what exactly they're saying would be done, but imagine pre-AI it might take someone quite a bit of manual effort to manufacture a plausible document regarding nuclear developments, but with AI it doesn't require so much work and is easier.
The main trick governments use isn't to hide the knowledge of how to build the stuff. It's rather to ban the sale of precursor chemicals and specialized devices (think: industrial-scale centrifuges) except through a government-observable KYC/AML-like chain-of-custody tracking scheme, that assumes/requires each intermediary and final consumer to be an organization certified as meeting certain security requirements.

Individuals obviously need not apply. But regular companies need not apply, either. Think "checkpoints and sign-out sheets that ensure that your own company will notice if some of this stuff disappears." Picture the sort of thing your mind might conjure if you've watched enough forensics protocol dramas and I say "evidence locker" and "tamper-evident seals" — except crossed with hazardous-materials handling policies.

The thing is, this whole chain-of-custody system can be pretty easily circumvented. I won't go deep into how (I'll just say: 1. there are principal-agent problems in academia, and 2. this system wasn't designed to handle sudden organizational bankruptcies well.) But the point is that a grey market for these precursor chemicals and specialized devices exists.

The main place that "false positive" events come from, that the state has to look into, is from people who manage to acquire precursor chemicals/devices without being part of any known chain of custody. (Which, note, doesn't mean that they did anything illegal per se. If it turns out they're just, say, a chemistry-education content creator, then the intelligence body just adds them to their knowledge graph and otherwise leaves them be. But they do have to do some interviewing to determine that first.)

To minimize the number of such events, the "knowledge" that is being truly suppressed here, isn't actually the knowledge of how to do the work; it's the knowledge of how to circumvent the chain-of-custody system. In other words: the logistics.

Information about "how to make a nuke" is general and evergreen; you can just absorb the lesson once and be good. So that info is just "out there", irrevocably. But information about "how to acquire the stuff to make a nuke" is both at least somewhat local to the country you're trying to do it from/in, and also changes all the time, as each state chases up and shuts down existing grey-market channels, and then new ones spring up to replace them. Thus, suppressing logistical knowledge is actually both useful and tractable. And so that's what states mostly go after.

(Mind you, the knowledge of "how to do the thing" does often end up roped into this knowledge-suppression scheme by overzealous downstream regulators who don't understand the load-bearing assumptions of the system they're working under.)

---

The worry states have about LLMs, I think, is that simply by scraping the web into a training dataset, they'll end up stumbling onto the right conversations (that sometimes do indeed happen anonymously in public) to end up with fresh + local chain-of-custody circumvention-logistics knowledge. (And it'd be very hard to "unpick" data like that from the training data.)

Or, even if they don't ingest the data at training time, they'll ingest "the places where that kind of info might end up", and thereby get so good at being "runtime demand-driven searching-and-scraping engines" for this type of thing that they'll be able to surface fresh sources of such info anyway — basically cranking the logistical-pipeline "reconnection speed" after state disruption of a supply channel down to near-zero.

Prohibiting the LLMs from speaking on this subject generally, prevents them specifically from enabling this specific fast-turnaround circumvention-logistics research use-case.