yeah but it doesn't understand the exact syntax on an absolute level, does it...? I understood this to be the same as any language model applied to programming languages (aka Formal Languages). Is that mistaken?
As far as I understand, and I may be wrong here, the system is composed of two networks: Gemini and AlphaZero. Gemini, being an ordinary LLM with some fine-tunes, only does translation from natural to formal language. Then, AlphaZero solves the problem. AlphaZero, unburdened with natural language and only dealing with "playing a game in the proof space" (where the "moves" are commands to the Lean theorem prover), does not hallucinate in the same way an LLM does because it is nothing like an LLM.
Yes, but the problem space means that invalid outputs can be quickly identified - whereas general programming isn’t necessarily amenable to rapid checks.
I mean, aren’t you just describing formal language syntax? Seems like a fundamentally similar situation —- the computer can automatically flag any syntax errors in a millisecond by checking it against the generating grammar for that language. Thats what makes a formal language in the first place, I think!
I do think this language is considerably more robust than the typical programming language, which means a sound program is more likely to end up being valid/“correct”. But still, that’s a difference of degree, not kind, IMO
I don’t mean syntax errors - I mean the difficulty of validating code that contains side effects (like http requests, database access etc).
Validating a math proof either terminates in a reasonable time (in which case it’s useful for training), or does not (in which case the AI should be discouraged from using that approach).