| Ultimately its a millennia old fight between manipulators and rationalism. The bead maze toy versus the abbacus. Twisted contorted reasoning versus formal verification. Choose your abbacus. The black box nature of machine learning models is not the issue. Instead of training to imitate vacuous conjectures and claims as humans on average typically do, they could be trained to do automated theorem proving, AlphaZero style. A minimalistic verifier like metamath is available free for download, including set.mm and a freely available book. It would be hard to purge from civilization. Currently its math database is collaboratively worked on at github. In theory a blockchain could host it. Fermat style challenging could be used to objectively assess the value of theorems: the longer it matures unproven as a challenge on the chain, the higher the reward if someone finally proves it. This inevitably creates an incentive to enter and digitize known mathematics into machine readable form, which will be easy for machine learning to accomplish. Machine learning empowered automated theorem proving will become a profitable business, with the fruits available for all to benefit from. Well, cryptography and protocols will also appear. So during training the machine learning models will get endless bedtime stories about Alice, Bob and Eve. Using conventional forward or backward chaining combined with adversarial models, one can construct arbitrary provable theorems, and negate it, then hide or propagate the negation so its not simply the first symbol in the theorem. So we can train models to challenge each other Fermat style, about the truth or falsehood of a statement, and demand proof. We can thus construct artificial mathematical systems with known inconsistencies and train models to seek a proof that the system contains an inconsistency. Such a proof will depend on the conflicting axioms. Hence the models will be our best tool to detect and resolve hypocrisy. The literal meaning of "apocalypse" is "revelation" or "uncovering", not "big tragedy"... The verification algorithm, for example the ~300 LoC python implementation by Raph Levien, owes most of its length to parsing the metamath file format. The actual Kolmogorov complexity of the verification algorithm itself is much smaller. There won't be any bits to "align". All these hopeless attempts at trying to align the intuition component of the machine learning model, instead of training it to gain intuition in producing logical derivations. The real horror of the control freaks is not that their alignment mechanisms might fail, but that its impossible to bias the verification algorithm itself, that its impossible to perpetuate the conflicts of interest, that any additional code in the verifier is immediately suspect, especially if it obviously skips all checks and dogmatically accepts a statement if its signed by a hardcoded "right" key. The objective judge will be mechanized. "abaccus akbar!" |